Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) )

Similar documents
Natural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)

Natural Language Processing. SoSe Question Answering

Question Answering Systems

Natural Language Processing

ESERCITAZIONE PIATTAFORMA WEKA. Croce Danilo Web Mining & Retrieval 2015/2016

Parsing tree matching based question answering

Introduction to Text Mining. Hongning Wang

Artificial Intelligence User Support System for SAP Enterprise Resource Planning

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Semantic Extensions to Syntactic Analysis of Queries Ben Handy, Rohini Rajaraman

Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources

CS674 Natural Language Processing

A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2

Taming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island

Watson & WMR2017. (slides mostly derived from Jim Hendler and Simon Ellis, Rensselaer Polytechnic Institute, or from IBM itself)

Ghent University-IBCN Participation in TAC-KBP 2015 Cold Start Slot Filling task

International Journal of Scientific & Engineering Research Volume 7, Issue 12, December-2016 ISSN

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Information Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science

Information Retrieval

Information Retrieval

Question Answering Approach Using a WordNet-based Answer Type Taxonomy

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Hybrid Neural Model for Type Classification of Entity Mentions

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

CHAPTER 5 EXPERT LOCATOR USING CONCEPT LINKING

NUS-I2R: Learning a Combined System for Entity Linking

Making Sense Out of the Web

OPEN INFORMATION EXTRACTION FROM THE WEB. Michele Banko, Michael J Cafarella, Stephen Soderland, Matt Broadhead and Oren Etzioni

Unstructured Data. CS102 Winter 2019

AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands

SAPIENT Automation project

A Comparison Study of Question Answering Systems

Knowledge Engineering with Semantic Web Technologies

All lecture slides will be available at CSC2515_Winter15.html

Collective Intelligence in Action

Query Difficulty Prediction for Contextual Image Retrieval

Named Entity Detection and Entity Linking in the Context of Semantic Web

Generating FrameNets of various granularities: The FrameNet Transformer

Presented by: Dimitri Galmanovich. Petros Venetis, Alon Halevy, Jayant Madhavan, Marius Paşca, Warren Shen, Gengxin Miao, Chung Wu

Question answering. CS474 Intro to Natural Language Processing. History LUNAR

Web-based Question Answering: Revisiting AskMSR

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Python With Data Science

NLP Final Project Fall 2015, Due Friday, December 18

Syntax and Grammars 1 / 21

RPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???

Text Mining for Software Engineering

QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK

Introduction p. 1 What is the World Wide Web? p. 1 A Brief History of the Web and the Internet p. 2 Web Data Mining p. 4 What is Data Mining? p.

Question Answering Using XML-Tagged Documents

Fine-Grained Semantic Class Induction via Hierarchical and Collective Classification

Introduction to Lexical Analysis

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame

A bit of theory: Algorithms

Finding Related Entities by Retrieving Relations: UIUC at TREC 2009 Entity Track

Outline. Morning program Preliminaries Semantic matching Learning to rank Entities

On Classification: An Empirical Study of Existing Algorithms Based on Two Kaggle Competitions

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown

Large Scale Chinese News Categorization. Peng Wang. Joint work with H. Zhang, B. Xu, H.W. Hao

CS6200 Information Retrieval. Jesse Anderton College of Computer and Information Science Northeastern University

Machine Learning in GATE

University of Sheffield, NLP. Chunking Practical Exercise

Maximum Entropy based Natural Language Interface for Relational Database

Predicting Popular Xbox games based on Search Queries of Users

Search Engines. Information Retrieval in Practice

USING SEMANTIC REPRESENTATIONS IN QUESTION ANSWERING. Sameer S. Pradhan, Valerie Krugler, Wayne Ward, Dan Jurafsky and James H.

Part I: Data Mining Foundations

VALLIAMMAI ENGINEERING COLLEGE SRM Nagar, Kattankulathur DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING QUESTION BANK VII SEMESTER

Statistical NLP Spring 2009

Grounded Compositional Semantics for Finding and Describing Images with Sentences

CACAO PROJECT AT THE 2009 TASK

Performance Evaluation of Various Classification Algorithms

Detection and Extraction of Events from s

Text mining tools for semantically enriching the scientific literature

Domain-specific user preference prediction based on multiple user activities

Dmesure: a readability platform for French as a foreign language

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CAMCOS Report Day. December 9 th, 2015 San Jose State University Project Theme: Classification

AUTOMATIC VISUAL CONCEPT DETECTION IN VIDEOS

Automatic Wordnet Mapping: from CoreNet to Princeton WordNet

ECG782: Multidimensional Digital Signal Processing

Generalizing from Freebase and Patterns using Cluster-Based Distant Supervision for KBP Slot-Filling

Contents. Preface to the Second Edition

Information Retrieval

PRIS at TAC2012 KBP Track

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Bing Liu. Web Data Mining. Exploring Hyperlinks, Contents, and Usage Data. With 177 Figures. Springer

Annotating Spatio-Temporal Information in Documents

SOCIAL MEDIA MINING. Data Mining Essentials

Performance Analysis of Data Mining Classification Techniques

Semantic Retrieval of the TIB AV-Portal. Dr. Sven Strobel IATUL 2015 July 9, 2015; Hannover

Temporal Information Access (Temporalia-2)

Domain-specific Concept-based Information Retrieval System

Final Project Discussion. Adam Meyers Montclair State University

Learning to Answer Biomedical Factoid & List Questions: OAQA at BioASQ 3B

Large-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop

Tools and Infrastructure for Supporting Enterprise Knowledge Graphs

CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS

Master Project. Various Aspects of Recommender Systems. Prof. Dr. Georg Lausen Dr. Michael Färber Anas Alzoghbi Victor Anthony Arrascue Ayala

Transcription:

Natural Language Processing SoSe 2014 Question Answering Dr. Mariana Neves June 25th, 2014 (based on the slides of Dr. Saeedeh Momtazi) )

Outline 2 Introduction History QA Architecture Natural Language Processing Question Answering

Outline 3 Introduction History QA Architecture Natural Language Processing Question Answering

Motivation 4 Finding small segments of text which answer users questions Natural Language Processing Question Answering

Motivation 5 Natural Language Processing Question Answering

Motivation 6 Natural Language Processing Question Answering

Motivation How many presidents do the USA have? 44 List of names in chronological order (http://en.wikipedia.org/wiki/list_of_presidents_of_the_united_states_by_occupation) 7 Natural Language Processing Question Answering

Motivation 8 Information retrieval Keywords (short input) Document (long output) Question Answering Natural language question (long input) Short answer (short input) Natural Language Processing Question Answering

QA Types Closed-domain Open-domain 9 Answering questions from a specific domain Answering any domain independent question Natural Language Processing Question Answering

Outline Introduction History QA Architecture 10 Natural Language Processing Question Answering

History BASEBALL [Green et al., 1963] One of the earliest question answering systems Developed to answer users questions about dates, locations, and the results of baseball matches LUNAR [Woods, 1977] Developed to answer natural language questions about the geological analysis of rocks returned by the Apollo moon missions Able to answer 90% of questions in its domain posed by people not trained on the system 11 Natural Language Processing Question Answering

History STUDENT PHLIQA Developed to answer the user s questions about European computer systems UC (Unix Consultant) Built to answer high-school students questions about algebraic exercises Answered questions about the Unix operating system LILOG Was able to answer questions about tourism information of cities in Germany 12 Natural Language Processing Question Answering

Closed-domain QA Closed-domain systems Extracting answers from structured data (database) (Labor-intensive to build) Converting natural language questions to database queries (easy to implement) 13 Natural Language Processing Question Answering

Open-domain QA Closed-domain QA Open-domain QA Using a large collection of unstructured data (e.g., the Web) instead of databases Advantages Many subjects are covered Information is constantly added and updated No manual work is required to build databases Problems Information is not always up-to-date Wrong information cannot be avoided Much irrelevant information is found 14 Natural Language Processing Question Answering

Open-domain QA START [Katz, 1997] Utilized a knowledge-base to answer the user s questions The knowledge-base was first created automatically from unstructured Internet data Then it was used to answer natural language questions http://start.csail.mit.edu/index.php 15 Natural Language Processing Question Answering

Open-domain QA 16 Natural Language Processing Question Answering

IBM Watson Playing against two greatest champions of Jeopardy Challenges Knowledge Speed Confidence Final Jeopardy, Stephen Backer Building Watson: An Overview of the DeepQA Project. Ferrucci et al. AI Magazine, 2010. https://www.youtube.com/watch?v=wfr3lom_xhe 17 Natural Language Processing Question Answering

Outline Introduction History QA Architecture 18 Natural Language Processing Question Answering

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 19 Natural Language Processing Question Answering Information Retrieval Architecture

Question Analysis Named Entity Recognition Surface Text Pattern Learning Syntactic Parsing Semantic Role Labeling 20 Natural Language Processing Question Answering

Question Analysis: Named Entity Recognition Recognizing the named entities in the text to extract the target of the question Using the question s target in the query construction step Example: Question: In what country was Albert Einstein born? Target: Albert Einstein 21 Natural Language Processing Question Answering

Question Analysis: Pattern Learning Extracting a pattern from the question Matching the pattern with a list of pre-defined question patterns Finding the corresponding answer pattern Realizing the position of the answer in the sentence in the answer extraction step Example: Question: In what country was Albert Einstein born? Question Pattern: In what country was X born? Answer Pattern: X was born in Y. 22 Natural Language Processing Question Answering

Question Analysis: Syntactic Parsing Using a dependency parser to extract the syntactic relations between question terms Using the dependency relation paths between question terms to extract the correct answer in the answer extraction step (http://nlp.stanford.edu:8080/corenlp/process) 23 Natural Language Processing Question Answering

Question Analysis: Semantic Role Labeling FrameNet: a lexical database for English More than 170,000 manually annotated sentences Frame Semantics: describes the type of event, relation, or entity and the participants in it. Example: John[Cook] grills a fish[food] on an open fire 24 Natural Language Processing Question Answering [Heating-Instrument]

Question Analysis: Semantic Role Labeling Frame assignment Role labeling Jim[Driver] flew his plane[vehicle] to Texas[Goal] Alice[Destroyer] destroys the item[undergoer] with a plane[instrument] 25 Natural Language Processing Question Answering OPERATE VEHICLE DESTROYING

Question Analysis: Semantic Role Labeling Finding the question s head verb Example: Who[Buyer] purchased YouTube[Goods]? COMMERCE BUY Buyer [Subj,NP] verb Goods [Obj,NP] Buyer [Subj,NP] verb Goods [Obj,NP] Seller [Dep,PP-from] Goods [Subj,NP] verb Buyer [Dep,PP-by]... Example: In 2006, YouTube[Goods] was purchased by Google[Buyer] for $1.65 billion. 26 Natural Language Processing Question Answering

Question Classification Classifying the input question into a set of question types Mapping question types to the available named entity labels Finding strings that have the same type as the input question in the answer extraction step Example: Question: In what country was Albert Einstein born? Type: LOCATION - Country 27 Natural Language Processing Question Answering

Question Classification Example (NER): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family. 28 Natural Language Processing Question Answering

Question Classification Classification taxonomies BBN Pasca & Harabagiu Li & Roth 6 coarse- and 50 fine-grained classes ABBREVIATION ENTITY DESCRIPTION HUMAN LOCATION NUMERIC 29 Natural Language Processing Question Answering

Question Classification (http://cogcomp.cs.illinois.edu/data/qa/qc/definition.html) 30 Natural Language Processing Question Answering

Question Classification How did serfdom develop in and then leave Russia? What films featured the character Popeye Doyle? What fowl grabs the spotlight after the Chinese Year of the Monkey? What is the full form of.com? What team did baseball 's St. Louis Browns become? DESC ENTY ENTY manner cremat animal ABBR HUM exp gr (http://cogcomp.cs.illinois.edu/data/qa/qc/train_1000.label) 31 Natural Language Processing Question Answering

Question Classification Coarse Classifier ABBREVIATION ENTITY DESCRIPTION HUMAN LOCATION NUMERIC Fine Classifier group 32 Natural Language Processing Question Answering ind title description

Question Classification Using any kinds of supervised classifiers K Nearest Neighbor Support Vector Machines Naïve Bayes Maximum Entropy Logistic Regression Benefiting from available toolkits: Weka, SVM-light, etc. Considering the confidence measure of the classification to filter the result 33 Natural Language Processing Question Answering

Query Construction Goal: Formulating a query with a high chance of retrieving relevant documents Task: Assigning a higher weight to the question s target Using query expansion techniques to expand the query 34 Natural Language Processing Question Answering

Document Retrieval Importance: QA components use computationally intensive algorithms Time complexity of the system strongly depends on the size of the to be processed corpus Task: Reducing the search space for the subsequent components Retrieving relevant documents from a large corpus Selecting top n retrieved document for the next steps 35 Natural Language Processing Question Answering

Document Retrieval Using available information retrieval models Vector Space Model Probabilistic Model Language Model Using available information retrieval toolkits: Lucene, Lemur, etc. 36 Natural Language Processing Question Answering

Sentence Retrieval Task: Finding small segments of text that contain the answer Benefits beyond document retrieval: Documents are very large Documents span different subject areas The relevant information is expressed locally Retrieving sentences simplifies the answer extraction step Main problem: sentence brevity 37 Natural Language Processing Question Answering

Sentence Retrieval Information retrieval models for sentence retrieval Vector Space Model Probabilistic Model Language Model Jelinek-Mercer Linear Interpolation Bayesian Smoothing with Dirichlet Prior Absolute Discounting 38 Natural Language Processing Question Answering

Sentence Retrieval Approaches to overcome the sentence brevity problem: Term relationship models Translation model Term clustering model 39 Natural Language Processing Question Answering

Sentence Annotation Similar to Question Analysis Annotating relevant sentences using linguistic analysis Named entity recognition Syntactic parsing Semantic role labeling Example (NER): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family. 45 Natural Language Processing Question Answering

Answer Extraction Extracting candidate answers based on various information Question Question Analysis: patterns Question Analysis: syntactic parse Question Analysis: semantic roles Question Classification: question type Sentence Sentence Annotation: all annotated data 46 Natural Language Processing Question Answering

Answer Extraction Using extracted patterns Example: Question: In what country was Albert Einstein born? Question Pattern: In what country was X born? Answer Pattern: X was born in Y. Example (Pattern): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family. 47 Natural Language Processing Question Answering

Answer Extraction Using question type and entity type Example: Question: In what country was Albert Einstein born? Question Type: LOCATION - Country Example (NER): S1: Albert Einstein was born in 14 March 1879. S2: Albert Einstein was born in Germany. S3: Albert Einstein was born in a Jewish family. 48 Natural Language Processing Question Answering

Answer Extraction Using syntactic parsing Different wordings possible, but similar syntactic structure (based on the work by Dan Shen) 49 Natural Language Processing Question Answering

Answer Extraction Using syntactic parsing Many syntactic variations need robust matching approach (based on the work by Dan Shen) 50 Natural Language Processing Question Answering

Answer Extraction Using semantic roles Example: Who[Buyer] purchased YouTube?[Goods] Example: In 2006, YouTube[Goods] was purchased by Google[Buyer] for $1.65 billion. 51 Natural Language Processing Question Answering

Answer Validation Using Web as a knowledge resource for validating answers Required steps Query creation Answer rating 52 Natural Language Processing Question Answering

Answer Validation Query creation Combining the answer with a subset of the question keywords Choosing different combinations of subsets Bag-of-Word Noun phrase chunks Declarative form 53 Natural Language Processing Question Answering

Answer Validation Answer rating Passing the query to a search engine Analyzing the result of the search engine Counting the results Parsing the result snippets Other possibilities: Using knowledge bases to find relations between the question keywords and the answer 56 Natural Language Processing Question Answering

Natural Language Processing Question Analysis How many presidents do the USA have? Question Classification Query Construction Corpus Document Retrieval Sentence Retrieval Web 44 Answer Validation Sentence Annotation Answer Extraction Information Extraction 57 Natural Language Processing Question Answering Information Retrieval Architecture

Further Reading Speech and Language Processing Chapters 23.1, 23.2 58 Natural Language Processing Question Answering