Deliverable D1.4 Report Describing Integration Strategies and Experiments
|
|
- Everett Phelps
- 5 years ago
- Views:
Transcription
1 DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004
2 Report Describing Integration Strategies and Experiments D1.4 Project ref. no. Project acronym Project full title - Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Security (distribution level) Public Contractual date of delivery Actual date of delivery Deliverable number D1.4 Deliverable name Type Status & version Number of pages 9 WP contributing to the deliverable WP / Task responsible Other contributors Author(s) EC Project Officer Keywords Abstract Report Describing Integration Strategies and Experiments Report Final WP1b WP1b John Carroll, Alex Fang, Melanie Siegel Evangelia Markidou Hybrid NLP, Named-Entity Recognition, architecture The implemented strategies for hybrid NLP are described and examples are given using screenshots. II
3 Content 1 Integration Strategies in the Heart of Gold 2 2 Mobile Phone Name Recognition for English Input and Output Specification Construction of an annotated sub-corpus The recognition program A quantitative evaluation of the mobile phone name recogniser Error Analysis References 11 1
4 1 Integration Strategies in the Heart of Gold Implemented strategies for hybrid NLP in the project include: The analysis results of NLP tools at lower processing levels can be used by components at higher levels. o For example, the deep linguistic analysis module PET uses default lexicon entries for Named-Entities that the Named-Entity Recognition Sprout delivers. o For example, the deep linguistic analysis module PET uses default lexicon entries for Part-of-Speech tags that the POS tagger TnT delivers. Deliver the deepest result found. If a module of the required depth cannot deliver a result, deliver the next deepest result. This is the approach that the autoresponse application mainly follows. 2
5 Deliver partial results, whenever a complete analysis is not available. Partial results are taken from the deepest module that delivers results. Combine modules and grammars for different languages. Each language has its own configuration of valid modules and grammars. 3
6 The different modules use a compatible output formalism, RMRS. o In case of shallower modules, this robust semantic structure allows for underspecification of, e.g., argument structure. Refine the data provided by shallower modules through deep parsing. This is a strategy the applications Business Intelligence and Autoresponse use. Chunk processing and named-entity recognition is used to find relevant information sources, while deep processing is then applied to the found information snippets, either to verify or to filter the extracted information. 4
7 2 Mobile Phone Name Recognition for English We describe below the construction and evaluation of a module for named entity recognition of mobile phone names. The module was integrated into the RASP English shallow analysis system, which in turn forms part of the Heart of Gold. On a manually annotated test set, the module achieved a recognition F-score of 81.5%. 2.1 Input and Output Specification The input to the mobile phone name recognition module is a sequence of sentences in English that have already been marked up in XML style for word boundaries, with part-of-speech tags automatically assigned by RASP (Briscoe and Carroll 2002). Since this happens before the morphological analyser in the RASP pipeline, the tokens have not been lemmatized. For example, given the sentence I am thinking of upgrading to the Sony Ericsson T68is from a Nokia 8260, the input to the module is: ^ ^ <w s='2' e='2'>i</w> PPIS1 <w s='4' e='5'>am</w> VBM <w s='7' e='14'>thinking</w> VVG <w s='16' e='17'>of</w> IO <w s='19' e='27'>upgrading</w> NN1 <w s='29' e='30'>to</w> II <w s='32' e='34'>the</w> AT <w s='36' e='39'>sony</w> NP1 <w s='41' e='48'>ericsson</w> NP1 <w s='50' e='54'>t68is</w> NN1 <w s='56' e='59'>from</w> II <w s='61' e='61'>a</w> AT1 <w s='63' e='67'>nokia</w> NP1 <w s='69' e='72'>8260</w> MC ^ ^ The task of the module is to mark up the mobile phone named entities in the input, namely Sony Ericsson T68is and Nokia 8260 in this example: ^ ^ 5
8 <w s='2' e='2'>i</w> PPIS1 <w s='4' e='5'>am</w> VBM <w s='7' e='14'>thinking</w> VVG <w s='16' e='17'>of</w> IO <w s='19' e='27'>upgrading</w> NN1 <w s='29' e='30'>to</w> II <w s='32' e='34'>the</w> AT <w netype='phone'> <w s='36' e='39'>sony</w> <w s='41' e='48'>ericsson</w> <w s='50' e='54'>t68is</w> </w> NP <w s='56' e='59'>from</w> II <w s='61' e='61'>a</w> AT1 <w netype='phone'> <w s='63' e='67'>nokia</w> <w s='69' e='72'>8260</w> </w> NP ^ ^ where Sony Ericsson T68is and Nokia 8260 are marked up as named entities of type mobile phone (i.e. netype='phone'). They are then treated as a single unit tagged as NP, namely, a proper name. The analysis based on this output from the module will be taken further down the RASP pipeline and yield the following RMRS representation: 6
9 2.2 Construction of an annotated sub-corpus For work described in Workpackage 2B, a 4,000,000-word corpus of Internet discussions on mobile phones was created for the domain-specific extraction of a verb subcategorisation lexicon (Carroll and Fang 2004). From this corpus, we randomly selected two sets of 200 texts each. Each text was then manually annotated such that each instance of a mobile phone name, a model number, or any combination of the two was marked up as an entity (<mobile> and </mobile>). Here is an example: I have for sale the following ORIGINAL <mobile> Nokia </mobile> accessories that will fit any of the <mobile> Nokia 6100 </mobile> / <mobile> 5100 </mobile> series phones, including but not limited to <mobile> 6160 </mobile>, <mobile> 6190 </mobile>, <mobile> 6188 </mobile>, <mobile> 6185 </mobile>, <mobile> 6162 </mobile>, <mobile> 6161 </mobile>, <mobile> 6185i </mobile>, <mobile> 5160 </mobile>, <mobile> 5190 </mobile>, etc. 7
10 The two sets are summarised in Table 1: Texts Sentences Words Entities Set Set Total Table 1: A summary of the annotated corpus 2.3 The recognition program The automatic recogniser was implemented in C. The algorithm was designed based on the observation that the distribution of mobile phone names in our corpus is relatively sparse. There is insufficient data to train a purely statistical recogniser (e.g. a Maximum Entropy Model); it may however be possible to train a combined symbolic/statistical model (incorporating information for example on manufacturer names). A set of mobile phone manufacturer names, such as Nokia and Ericsson, was manually drawn up. The remainder of the mobile phone corpus that had not been annotated (ca 2,800,000 tokens) was then used to construct a list of all the alphanumeric strings that contain at least 1 digit and that immediately follow one of these names. This process resulted in two entity sets: a list of mobile phone names a list of model numbers with their associated mobile phone names The automatic recogniser marks the following as an entity: every occurrence of the mobile phone names every occurrence of the model numbers, given the following conditions they are longer than 3 characters in length they occurred more than once in the training corpus they occurred less than times in the training corpus 1 Numbers occurring more than 2000 times are interpreted as genuine "free" cardinals that are unlikely to be used in reference to a mobile phone. 8
11 2.4 A quantitative evaluation of the mobile phone name recogniser For the quantitative evaluation of the mobile phone name recogniser's performance, the first annotated set was used for development and the second set was kept for testing. Both sets were sub-divided into 4 sets containing the same number of word tokens with a view to indicate any possible variation in terms of performance. The initial run of the recogniser on the development set produced the following results: Total Precision Recall F-Score Table 2: Performance before tuning on the development set The F-Score for the development set was just under 80%. Variations across the four sub-sets can be observed, with Set 3 showing the best F-Score of 83.6%. The output was manually inspected and changes made to the list of mobile phone names and model numbers. Subsequent performance on the development set shows an F-Score of 82.1%, an increase of nearly 3% from the previous 79.4%: Total Precision Recall F-Score Table 3: Performance after tuning on the development set When tested on the test set, the recogniser achieved an overall performance of 81.5%, with a precision score of 81% and a recall rate of 81.9%: Total 9
12 Precision Recall F-Score Table 4: Performance on the test set As can be observed from the table above, the best performance was 91.1% F-Score and the worst performance was 73.2%, showing considerable variation in this set and therefore suggesting that the performance of the system varies with different types of input. 10
13 2.5 Error Analysis There are two major sources of errors. First of all, there is frequent ambiguity between phone names and company names, as in the following example: I was wondering if anyone has any information on how the Ericsson Bluetooth kits calculates the BER packets when the BER test is run. where Ericsson can be analysed as referring to the company instead of the phone. Arguably, this is a genuinely ambiguous case. The second major ambiguity is between numbers and model numbers: There are 2 connectors on the cable, 1 RS 232 and 1 cigarette lighter. Since 232 has been observed before as co-occurring with mobile phone names, the module believes that in the current context it refers to a mobile phone product and therefore erroneously marks it as a phone name. 3 References Briscoe, E. and J. Carroll Robust accurate statistical annotation of general text. In Proceedings of the 3 rd International Conference on Language Resources and Evaluation, Las Palmas, Gran Canaria Carroll, J. and A.C. Fang The Automatic Acquisition of Verb Subcategorisations and their Impact on an HPSG Parser. In Proceedings of the 1 st International Joint Conference on Natural Language Processing, March 2004, Hainan, China. Uszkoreit, Hans, Ulrich Callmeier, Andreas Eisele, Ulrich Schäfer, Melanie Siegel, Jakob Uszkoreit (2004): Hybrid Robust Deep and Shallow Semantic Processing for Creativity Support in Document Production. In Proceedings of KONVENS 2004, Vienna, Austria. Callmeier, Ulrich, Eisele, Andreas, Schäfer, Ulrich and Melanie Siegel (2004): The Core Architecture Framework. In Proceedings of LREC 04, Lisbon, Portugal. 11
Deliverable 4.6 Architecture Specification and Mock-up System
DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable 4.6 Architecture Specification and Mock-up System The Consortium October 2003 I II PROJECT REF. NO.
More informationHyLaP-AM Semantic Search in Scientific Documents
HyLaP-AM Semantic Search in Scientific Documents Ulrich Schäfer, Hans Uszkoreit, Christian Federmann, Yajing Zhang, Torsten Marek DFKI Language Technology Lab Talk Outline Extracting facts form scientific
More informationSEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY. Parser Evaluation Approaches
SEMINAR: RECENT ADVANCES IN PARSING TECHNOLOGY Parser Evaluation Approaches NATURE OF PARSER EVALUATION Return accurate syntactic structure of sentence. Which representation? Robustness of parsing. Quick
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationOctober 19, 2004 Chapter Parsing
October 19, 2004 Chapter 10.3 10.6 Parsing 1 Overview Review: CFGs, basic top-down parser Dynamic programming Earley algorithm (how it works, how it solves the problems) Finite-state parsing 2 Last time
More informationFirst Version of Grammar Matrix
DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowldege-Intensive Information Extraction Deliverable 3.1 First Version of Grammar Matrix The DeepThought Consortium March 2003 DeepThought IST-2000-30161
More informationOrtolang Tools : MarsaTag
Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements
More informationLarge-Scale Syntactic Processing: Parsing the Web. JHU 2009 Summer Research Workshop
Large-Scale Syntactic Processing: JHU 2009 Summer Research Workshop Intro CCG parser Tasks 2 The Team Stephen Clark (Cambridge, UK) Ann Copestake (Cambridge, UK) James Curran (Sydney, Australia) Byung-Gyu
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationImplementing a Variety of Linguistic Annotations
Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing
More informationDeliverable 6.1 Results of a Workshop on Roadmap Activities
DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable 6.1 Results of a Workshop on Roadmap Activities The Consortium April 2003 1 PROJECT REF. NO. Project
More informationUIMA-based Annotation Type System for a Text Mining Architecture
UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationGrammar Knowledge Transfer for Building RMRSs over Dependency Parses in Bulgarian
Grammar Knowledge Transfer for Building RMRSs over Dependency Parses in Bulgarian Kiril Simov and Petya Osenova Linguistic Modelling Department, IICT, Bulgarian Academy of Sciences DELPH-IN, Sofia, 2012
More informationDeliverable D Adapted tools for the QTLaunchPad infrastructure
This document is part of the Coordination and Support Action Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad). This project has received funding from the
More informationA tool for Cross-Language Pair Annotations: CLPA
A tool for Cross-Language Pair Annotations: CLPA August 28, 2006 This document describes our tool called Cross-Language Pair Annotator (CLPA) that is capable to automatically annotate cognates and false
More informationAn UIMA based Tool Suite for Semantic Text Processing
An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationINTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) CONTEXT SENSITIVE TEXT SUMMARIZATION USING HIERARCHICAL CLUSTERING ALGORITHM
INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & 6367(Print), ISSN 0976 6375(Online) Volume 3, Issue 1, January- June (2012), TECHNOLOGY (IJCET) IAEME ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume
More informationText mining tools for semantically enriching the scientific literature
Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the
More informationMachine Learning in GATE
Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE Supervised learning Effort
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationFlexible Interfaces in the Application of Language Technology to an escience Corpus
Flexible Interfaces in the Application of Language Technology to an escience Corpus C.J. Rupp, Ann Copestake, Simone Teufel, Benjamin Waldron Computer Laboratory, University of Cambridge Abstract We describe
More informationNLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014
NLP Chain Giuseppe Castellucci castellucci@ing.uniroma2.it Web Mining & Retrieval a.a. 2013/2014 Outline NLP chains RevNLT Exercise NLP chain Automatic analysis of texts At different levels Token Morphological
More informationJU_CSE_TE: System Description 2010 ResPubliQA
JU_CSE_TE: System Description QA@CLEF 2010 ResPubliQA Partha Pakray 1, Pinaki Bhaskar 1, Santanu Pal 1, Dipankar Das 1, Sivaji Bandyopadhyay 1, Alexander Gelbukh 2 Department of Computer Science & Engineering
More informationModule 3: GATE and Social Media. Part 4. Named entities
Module 3: GATE and Social Media Part 4. Named entities The 1995-2018 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs Licence Named Entity Recognition Texts frequently
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationStatistical Parsing for Text Mining from Scientific Articles
Statistical Parsing for Text Mining from Scientific Articles Ted Briscoe Computer Laboratory University of Cambridge November 30, 2004 Contents 1 Text Mining 2 Statistical Parsing 3 The RASP System 4 The
More informationA CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012
A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationMeaning Banking and Beyond
Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis
More informationNatural Language Processing. SoSe Question Answering
Natural Language Processing SoSe 2017 Question Answering Dr. Mariana Neves July 5th, 2017 Motivation Find small segments of text which answer users questions (http://start.csail.mit.edu/) 2 3 Motivation
More informationFast and Effective System for Name Entity Recognition on Big Data
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam
More informationCRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools
CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools Wahed Hemati, Alexander Mehler, and Tolga Uslu Text Technology Lab, Goethe Universitt
More informationAutomatic Metadata Extraction for Archival Description and Access
Automatic Metadata Extraction for Archival Description and Access WILLIAM UNDERWOOD Georgia Tech Research Institute Abstract: The objective of the research reported is this paper is to develop techniques
More informationText Mining for Software Engineering
Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software
More informationA Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet
A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch
More informationRanked Retrieval. Evaluation in IR. One option is to average the precision scores at discrete. points on the ROC curve But which points?
Ranked Retrieval One option is to average the precision scores at discrete Precision 100% 0% More junk 100% Everything points on the ROC curve But which points? Recall We want to evaluate the system, not
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationHomework 2: Parsing and Machine Learning
Homework 2: Parsing and Machine Learning COMS W4705_001: Natural Language Processing Prof. Kathleen McKeown, Fall 2017 Due: Saturday, October 14th, 2017, 2:00 PM This assignment will consist of tasks in
More informationTowards an Integrated Architecture for Composite Language Services and Multiple Linguistic Processing Components
Towards an Integrated Architecture for Composite Language s and Multiple Linguistic Processing Components Arif Bramantoro 1, Ulrich Schäfer 2, Toru Ishida 1 1 Department of Social Informatics, Kyoto University,
More informationAT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands
AT&T: The Tag&Parse Approach to Semantic Parsing of Robot Spatial Commands Svetlana Stoyanchev, Hyuckchul Jung, John Chen, Srinivas Bangalore AT&T Labs Research 1 AT&T Way Bedminster NJ 07921 {sveta,hjung,jchen,srini}@research.att.com
More informationGenerating FrameNets of various granularities: The FrameNet Transformer
Generating FrameNets of various granularities: The FrameNet Transformer Josef Ruppenhofer, Jonas Sunde, & Manfred Pinkal Saarland University LREC, May 2010 Ruppenhofer, Sunde, Pinkal (Saarland U.) Generating
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationCHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS
82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the
More informationEnabling Semantic Search in Large Open Source Communities
Enabling Semantic Search in Large Open Source Communities Gregor Leban, Lorand Dali, Inna Novalija Jožef Stefan Institute, Jamova cesta 39, 1000 Ljubljana {gregor.leban, lorand.dali, inna.koval}@ijs.si
More informationUnstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki
Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki Overview What is UIMA? A framework for NLP tasks and tools Part-of-Speech Tagging Full Parsing Shallow Parsing
More informationMention Detection: Heuristics for the OntoNotes annotations
Mention Detection: Heuristics for the OntoNotes annotations Jonathan K. Kummerfeld, Mohit Bansal, David Burkett and Dan Klein Computer Science Division University of California at Berkeley {jkk,mbansal,dburkett,klein}@cs.berkeley.edu
More informationIntroduction to IE and ANNIE
Introduction to IE and ANNIE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. About this tutorial This tutorial comprises
More informationA BNC-like corpus of American English
The American National Corpus Everything You Always Wanted To Know... And Weren t Afraid To Ask Nancy Ide Department of Computer Science Vassar College What is the? A BNC-like corpus of American English
More informationMorpho-syntactic Analysis with the Stanford CoreNLP
Morpho-syntactic Analysis with the Stanford CoreNLP Danilo Croce croce@info.uniroma2.it WmIR 2015/2016 Objectives of this tutorial Use of a Natural Language Toolkit CoreNLP toolkit Morpho-syntactic analysis
More informationVoting between Multiple Data Representations for Text Chunking
Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.
More informationUNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES
UNIVERSITY OF EDINBURGH COLLEGE OF SCIENCE AND ENGINEERING SCHOOL OF INFORMATICS INFR08008 INFORMATICS 2A: PROCESSING FORMAL AND NATURAL LANGUAGES Saturday 10 th December 2016 09:30 to 11:30 INSTRUCTIONS
More informationD4.6 Data Value Chain Database v2
D4.6 Data Value Chain Database v2 Coordinator: Fabrizio Orlandi (Fraunhofer) With contributions from: Isaiah Mulang Onando (Fraunhofer), Luis-Daniel Ibáñez (SOTON) Reviewer: Ryan Goodman (ODI) Deliverable
More informationThe Multilingual Language Library
The Multilingual Language Library @ LREC 2012 Let s build it together! Nicoletta Calzolari with Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale
More informationBD003: Introduction to NLP Part 2 Information Extraction
BD003: Introduction to NLP Part 2 Information Extraction The University of Sheffield, 1995-2017 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. Contents This
More informationConceptual document indexing using a large scale semantic dictionary providing a concept hierarchy
Conceptual document indexing using a large scale semantic dictionary providing a concept hierarchy Martin Rajman, Pierre Andrews, María del Mar Pérez Almenta, and Florian Seydoux Artificial Intelligence
More informationTagging and parsing German using Spejd
Tagging and parsing German using Spejd Andreas Völlger Reykjavik University Reykjavik, Iceland andreasv10@ru.is Abstract Spejd is a newer tool for morphosyntactic disambiguation and shallow parsing. Contrary
More informationWikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Heather Simpson 1, Stephanie Strassel 1, Robert Parker 1, Paul McNamee
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationQANUS A GENERIC QUESTION-ANSWERING FRAMEWORK
QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK NG, Jun Ping National University of Singapore ngjp@nus.edu.sg 30 November 2009 The latest version of QANUS and this documentation can always be downloaded from
More informationAutomated Extraction of Event Details from Text Snippets
Automated Extraction of Event Details from Text Snippets Kavi Goel, Pei-Chin Wang December 16, 2005 1 Introduction We receive emails about events all the time. A message will typically include the title
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationTransforming Requirements into MDA from User Stories to CIM
, pp.15-22 http://dx.doi.org/10.14257/ijseia.2017.11.8.03 Transing Requirements into MDA from User Stories to CIM Meryem Elallaoui 1, Khalid Nafil 2 and Raja Touahni 1 1 Faculty of Sciences, Ibn Tofail
More informationAssignment #1: Named Entity Recognition
Assignment #1: Named Entity Recognition Dr. Zornitsa Kozareva USC Information Sciences Institute Spring 2013 Task Description: You will be given three data sets total. First you will receive the train
More informationSystem Combination Using Joint, Binarised Feature Vectors
System Combination Using Joint, Binarised Feature Vectors Christian F EDERMAN N 1 (1) DFKI GmbH, Language Technology Lab, Stuhlsatzenhausweg 3, D-6613 Saarbrücken, GERMANY cfedermann@dfki.de Abstract We
More informationExam Marco Kuhlmann. This exam consists of three parts:
TDDE09, 729A27 Natural Language Processing (2017) Exam 2017-03-13 Marco Kuhlmann This exam consists of three parts: 1. Part A consists of 5 items, each worth 3 points. These items test your understanding
More informationTransition-Based Dependency Parsing with Stack Long Short-Term Memory
Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented
More informationLecture 14: Annotation
Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose
More informationMining Aspects in Requirements
Mining Aspects in Requirements Américo Sampaio, Neil Loughran, Awais Rashid and Paul Rayson Computing Department, Lancaster University, Lancaster, UK {a.sampaio, loughran, marash, paul}@comp.lancs.ac.uk
More informationMaximum Entropy based Natural Language Interface for Relational Database
International Journal of Engineering Research and Technology. ISSN 0974-3154 Volume 7, Number 1 (2014), pp. 69-77 International Research Publication House http://www.irphouse.com Maximum Entropy based
More informationA Hybrid Unsupervised Web Data Extraction using Trinity and NLP
IJIRST International Journal for Innovative Research in Science & Technology Volume 2 Issue 02 July 2015 ISSN (online): 2349-6010 A Hybrid Unsupervised Web Data Extraction using Trinity and NLP Anju R
More informationLAB 3: Text processing + Apache OpenNLP
LAB 3: Text processing + Apache OpenNLP 1. Motivation: The text that was derived (e.g., crawling + using Apache Tika) must be processed before being used in an information retrieval system. Text processing
More information* Overview. Ontology-Guided Information Extraction from Pathology Reports The SWPatho Project David Schlangen Universität Potsdam
Overview Background of project The task The system Digression: gently machine aided ontology construction Evaluation Future Work -Guided Information Extraction from Pathology Reports The SWPatho Project
More informationCACAO PROJECT AT THE 2009 TASK
CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype
More informationNLP - Based Expert System for Database Design and Development
NLP - Based Expert System for Database Design and Development U. Leelarathna 1, G. Ranasinghe 1, N. Wimalasena 1, D. Weerasinghe 1, A. Karunananda 2 Faculty of Information Technology, University of Moratuwa,
More informationPrecise Medication Extraction using Agile Text Mining
Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,
More informationLearning Latent Linguistic Structure to Optimize End Tasks. David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu
Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith with Jason Naradowsky and Xiaoye Tiger Wu 12 October 2012 Learning Latent Linguistic Structure to Optimize End Tasks David A. Smith
More informationI Know Your Name: Named Entity Recognition and Structural Parsing
I Know Your Name: Named Entity Recognition and Structural Parsing David Philipson and Nikil Viswanathan {pdavid2, nikil}@stanford.edu CS224N Fall 2011 Introduction In this project, we explore a Maximum
More informationEvaluation of Named Entity Recognition in Dutch online criminal complaints
Evaluation of Named Entity Recognition in Dutch online criminal complaints Marijn Schraagen Floris Bex Matthieu Brinkhuis Utrecht University June 12, 2017 Internet fraud Online trade is widespread Transactions
More informationTIPSTER Text Phase II Architecture Requirements
1.0 INTRODUCTION TIPSTER Text Phase II Architecture Requirements 1.1 Requirements Traceability Version 2.0p 3 June 1996 Architecture Commitee tipster @ tipster.org The requirements herein are derived from
More informationA Multilingual Social Media Linguistic Corpus
A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationWebAnno: a flexible, web-based annotation tool for CLARIN
WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike
More informationstructure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame
structure of the presentation Frame Semantics semantic characterisation of situations or states of affairs 1. introduction (partially taken from a presentation of Markus Egg): i. what is a frame supposed
More informationTaming Text. How to Find, Organize, and Manipulate It MANNING GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS. Shelter Island
Taming Text How to Find, Organize, and Manipulate It GRANT S. INGERSOLL THOMAS S. MORTON ANDREW L. KARRIS 11 MANNING Shelter Island contents foreword xiii preface xiv acknowledgments xvii about this book
More informationAnette Frank, Markus Becker, Berthold Crysmann, Bernd Kiefer and Ulrich Schäfer
Integrated Shallow and Deep Parsing: TopP meets HPSG Anette Frank, Markus Becker, Berthold Crysmann, Bernd Kiefer and Ulrich Schäfer DFKI GmbH School of Informatics 66123 Saarbrücken, Germany University
More informationAutomatic Evaluation of Parser Robustness: Eliminating Manual Labor and Annotated Resources
Automatic Evaluation of Parser Robustness: Eliminating Manual Labor and Annotated Resources Johnny BIGERT KTH Nada SE-10044 Stockholm johnny@nada.kth.se Jonas SJÖBERGH KTH Nada SE-10044 Stockholm jsh@nada.kth.se
More informationANC2Go: A Web Application for Customized Corpus Creation
ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationPrediction-Based NLP System by Boyer-Moore Algorithm for Requirements Elicitation
Prediction-Based NLP System by Boyer-Moore Algorithm for Requirements Elicitation Dr A.Sumithra 1, K.Poongothai 2, Dr S.Gavaskar 3 1 Associate Professor, Dept of Computer Science & Engineering, VSB College
More informationLab II - Product Specification Outline. CS 411W Lab II. Prototype Product Specification For CLASH. Professor Janet Brunelle Professor Hill Price
Lab II - Product Specification Outline CS 411W Lab II Prototype Product Specification For CLASH Professor Janet Brunelle Professor Hill Price Prepared by: Artem Fisan Date: 04/20/2015 Table of Contents
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationCorpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002
Corpus methods for sociolinguistics Emily M. Bender bender@csli.stanford.edu NWAV 31 - October 10, 2002 Overview Introduction Corpora of interest Software for accessing and analyzing corpora (demo) Basic
More informationA platform for collaborative semantic annotation
A platform for collaborative semantic annotation Valerio Basile and Johan Bos and Kilian Evang and Noortje Venhuizen {v.basile,johan.bos,k.evang,n.j.venhuizen}@rug.nl Center for Language and Cognition
More informationA Textual Entailment System using Web based Machine Translation System
A Textual Entailment System using Web based Machine Translation System Partha Pakray 1, Snehasis Neogi 1, Sivaji Bandyopadhyay 1, Alexander Gelbukh 2 1 Computer Science and Engineering Department, Jadavpur
More informationQuestion Answering Approach Using a WordNet-based Answer Type Taxonomy
Question Answering Approach Using a WordNet-based Answer Type Taxonomy Seung-Hoon Na, In-Su Kang, Sang-Yool Lee, Jong-Hyeok Lee Department of Computer Science and Engineering, Electrical and Computer Engineering
More informationEnglish Understanding: From Annotations to AMRs
English Understanding: From Annotations to AMRs Nathan Schneider August 28, 2012 :: ISI NLP Group :: Summer Internship Project Presentation 1 Current state of the art: syntax-based MT Hierarchical/syntactic
More information