Semantics Isn t Easy Thoughts on the Way Forward
|
|
- Ella Morgan
- 5 years ago
- Views:
Transcription
1 Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University November 14-15, 2008 Approaches to Semantic Knowledge Acquisition Semi/un- supervised statistical methods to gather facts from unannotated corpora Smaller, (semi-)manual, detailed analyses using annotated resources Both approaches essential best leveraged if performed in close coordination
2 Semi/un-supervised learning Advantages Few resources required (inexpensive) Although may implicitly use throw-away annotations Implicitly reliant on a lot of knowledge Fast Can rapidly amass volumes of information Web provides vast amounts of language data Size overcomes noise? Reliability is a concern Results skewed by different varieties and genres, different types of speakers E.g. British vs. American English split in the frequency of a phenomenon E.g. inclusion of vast amounts of incorrect language or persistent non-native speaker errors unreliable information for ESL? Semi/un-supervised learning Limitations Assumes a (single) set of semantic facts (meanings, relations, etc.) that is stable and discoverable humans can agree on (90% of the time?) Web data unreliable Distributional features that are fundamentally unknown Left with a linguistic black box Variations due to genre, dialects, situation, context, when produced, etc. conflated Cannot capture fluid, dynamic, generative aspects of word and phrasal meaning We have not explored means to represent and process language that takes this into account
3 Analyses relying on annotated resources Advantages Overcomes disadvantages of unsupervised / unannotated approaches Can get at the more fluid and dynamic aspects of language Can examine impact of genre, situation, context, dialect, etc. by contolling corpus content Heavily annotated data enables exploration of interrelations among linguistic layers A critical next step for NLP research Analyses relying on annotated resources Disadvantages Expensive! Costly to manually or even semi-manually produce reliable language data and annotations Slow Manual work takes time
4 MASC Manually Annotated Sub-Corpus NSF-funded project to provide a sharable, reusable annotated resource with rich linguistic annotations Texts from wide range of genres Manual annotations or manually-validated annotations for multiple levels WordNet senses FrameNet frames and frame elements shallow parses named entities Enables linking WordNet senses and FrameNet frames into more complex semantic structures Enriches semantic and pragmatic information Detailed inter-annotator agreement measures Contents Texts drawn from Open ANC Freely distributable portions of LU Corpus Subset of Wall Street Journal texts that have been heavily annotated by multiple projects Several genres Written (travel guides, blog, fiction, letters, newspaper, non-fiction, technical, journal, government documents, court transcript) Spoken (face-to-face, academic, telephone) Free of license restrictions, redistributable All MASC data and annotations freely downloadable from ANC website (
5 Annotation Process Smaller portions of the sub-corpus manually annotated for specific phenomena Maintain representativeness Include as many annotations of different types as possible Apply (semi)-automatic annotation techniques to determine the reliability of their results Study inter-annotator agreement on manually-produced annotations Determine benchmark of accuracy Fine-tune annotator guidelines Consider if accurate annotations for one phenomenon can improve performance of automatic annotation systems for another E.G., Validated WN sense tags and noun chunks may improve automatic semantic role labeling Process (continued) Apply iterative process to maximize performance of automatic taggers Manual annotation Retrain automatic annotation software Improved annotation software later applied to the entire OANC Provide more accurate automatically-produced annotation of full corpus
6 Representation ISO TC37 SC4 Linguistic Annotation Framework Graph of feature structures (GrAF) isomorphic to other feature structure-based representations (e.g. UIMA CAS) Each annotation in a separate stand-off document linked to primary data or other annotations Merge annotations with ANC API Output in any of several formats XML non-xml for use with systems such as NLTK and concordancing tools UIMA CAS Input to GraphViz ANC Pipeline Automatically annotate Merge some or all annotations Texts in different formats ANC processing primary data Annotations as graph of feature structures in stand-off XML documents ANC Tool Input to UIMA Input to GraphViz Input to NLTK others...
7 Transduction Different annotation formats Transduce to GrAF Merge PTB PropBank NomBank PDTB TimeBank Alignment of Lexical Resources Concurrent NSF-funded project investigating how and to what extent WordNet and FrameNet can be aligned MASC annotations of FrameNet frames and frame elements and WordNet senses provide a ready-made testing ground
8 Goals Continually augment MASC with contributed annotations from the research community Discourse structure, additional entities, events, opinions, etc. Distribution of effort and integration of currently independent resources such as the ANC, WordNet, and FrameNet will enable progress in resource development Less cost No duplication of effort Greater degree of accuracy and usability Harmonization MASC can serve as a model for community effort to develop required methods and resources to further NLP research MASC Will be the largest semantically annotated corpus of English in existence Should have a major impact on the speed with which similar resources can be reliably annotated WN and FN annotation of the MASC will immediately create a massive multi-lingual resource network Both WN and FN linked to corresponding resources in other languages No existing resource approaches this scope Because it enables merging annotations at different linguistic levels, will facilitate a deeper investigation of interactions among linguistic phenomena contribute to better understanding of the workings of language at the semantic level
9 Recommendations and Conclusions Pursue automatic acquisition efforts and manual resource creation, annotation, and analysis in parallel Automatic acquisition can get us to the ~80% celing Manual effort can get us the other 20% Embrace the need to render the knowledge resources created by automatic acquisition in a form and format that can interoperate with annotations and other resources NLP community does not need yet-another-independentresource that is difficult or impossible to use with other resources PS OANC available at 1st set of MASC data (~120K words) should be available by end of year, augmented regularly after that We encourage contributions of annotations (automatic or manual) of MASC and/or OANC data for any linguistic phenomenon, in any format We will do the transduction to GrAF
ANC2Go: A Web Application for Customized Corpus Creation
ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu
More informationMASC: The Manually Annotated Sub-Corpus of American English. Nancy Ide*, Collin Baker**, Christiane Fellbaum, Charles Fillmore**, Rebecca Passonneau
MASC: The Manually Annotated Sub-Corpus of American English Nancy Ide*, Collin Baker**, Christiane Fellbaum, Charles Fillmore**, Rebecca Passonneau *Vassar College Poughkeepsie, New York USA **International
More informationBackground and Context for CLASP. Nancy Ide, Vassar College
Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding
More informationMASC: A Community Resource For and By the People
MASC: A Community Resource For and By the People Nancy Ide Department of Computer Science Vassar College Poughkeepsie, NY, USA ide@cs.vassar.edu Christiane Fellbaum Princeton University Princeton, New
More informationDEVELOPING LINGUISTIC RESOURCES WITH THE ANC REPORT ON THE NSF-FUNDED WORKSHOP
DEVELOPING LINGUISTIC RESOURCES WITH THE ANC REPORT ON THE NSF-FUNDED WORKSHOP Nany Ide, Vassar College Christiane Fellbaum, Princeton University 1 Introduction An NSF-funded workshop on Developing Linguistic
More informationAn Open Linguistic Infrastructure for Annotated Corpora
An Open Linguistic Infrastructure for Annotated Corpora Nancy Ide 1 Introduction Annotated corpora are a fundamental resource for research and development in the field of natural language processing (NLP).
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationImporting MASC into the ANNIS linguistic database: A case study of mapping GrAF
Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF Arne Neumann 1 Nancy Ide 2 Manfred Stede 1 1 EB Cognitive Science and SFB 632 University of Potsdam 2 Department of Computer
More informationAnnotation Science From Theory to Practice and Use Introduction A bit of history
Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora
More informationGenerating FrameNets of various granularities: The FrameNet Transformer
Generating FrameNets of various granularities: The FrameNet Transformer Josef Ruppenhofer, Jonas Sunde, & Manfred Pinkal Saarland University LREC, May 2010 Ruppenhofer, Sunde, Pinkal (Saarland U.) Generating
More informationImporting MASC into the ANNIS linguistic database: A case study of mapping GrAF
Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF Arne Neumann EB Cognitive Science and SFB 632 University of Potsdam neumana@uni-potsdam.de Nancy Ide Department of Computer
More informationA BNC-like corpus of American English
The American National Corpus Everything You Always Wanted To Know... And Weren t Afraid To Ask Nancy Ide Department of Computer Science Vassar College What is the? A BNC-like corpus of American English
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationBridging the Gaps. Interoperability for Language Engineering Architectures Using GrAF. Noname manuscript No. (will be inserted by the editor)
Noname manuscript No. (will be inserted by the editor) Bridging the Gaps Interoperability for Language Engineering Architectures Using GrAF Nancy Ide Keith Suderman Received: date / Accepted: date Abstract
More informationThe American National Corpus First Release
The American National Corpus First Release Nancy Ide and Keith Suderman Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA ide@cs.vassar.edu, suderman@cs.vassar.edu Abstract
More informationData for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit
Data for linguistics ALEXIS DIMITRIADIS Text, corpora, and data in the wild 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study:
More informationCorpus Linguistics: corpus annotation
Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course
More informationstructure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame
structure of the presentation Frame Semantics semantic characterisation of situations or states of affairs 1. introduction (partially taken from a presentation of Markus Egg): i. what is a frame supposed
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationBridging the Gaps: Interoperability for GrAF, GATE, and UIMA
Bridging the Gaps: Interoperability for GrAF, GATE, and UIMA Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York USA ide@cs.vassar.edu Keith Suderman Department of Computer Science
More informationSTS Infrastructural considerations. Christian Chiarcos
STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)
More informationSustainability of Text-Technological Resources
Sustainability of Text-Technological Resources Maik Stührenberg, Michael Beißwenger, Kai-Uwe Kühnberger, Harald Lüngen, Alexander Mehler, Dieter Metzing, Uwe Mönnich Research Group Text-Technological Overview
More informationFrame Semantic Structure Extraction
Frame Semantic Structure Extraction Organizing team: Collin Baker, Michael Ellsworth (International Computer Science Institute, Berkeley), Katrin Erk(U Texas, Austin) October 4, 2006 1 Description of task
More informationEnglish Understanding: From Annotations to AMRs
English Understanding: From Annotations to AMRs Nathan Schneider August 28, 2012 :: ISI NLP Group :: Summer Internship Project Presentation 1 Current state of the art: syntax-based MT Hierarchical/syntactic
More informationRPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???
@ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON
More informationCorpus Linguistics for NLP APLN550. Adam Meyers Montclair State University 9/22/2014 and 9/29/2014
Corpus Linguistics for NLP APLN550 Adam Meyers Montclair State University 9/22/ and 9/29/ Text Corpora in NLP Corpus Selection Corpus Annotation: Purpose Representation Issues Linguistic Methods Measuring
More information(Some) Standards in the Humanities. Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014
(Some) Standards in the Humanities Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014 1. Introduction Overview 2. Written text: the Text Encoding Initiative (TEI) 3. Multimodal: ELAN
More informationData and Information Integration: Information Extraction
International OPEN ACCESS Journal Of Modern Engineering Research (IJMER) Data and Information Integration: Information Extraction Varnica Verma 1 1 (Department of Computer Science Engineering, Guru Nanak
More informationAn UIMA based Tool Suite for Semantic Text Processing
An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life
More informationTowards a roadmap for standardization in language technology
Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA Vassar College Overview General background on standardization Available standards On-going activities
More informationXML Support for Annotated Language Resources
XML Support for Annotated Language Resources Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York USA ide@cs.vassar.edu Laurent Romary Equipe Langue et Dialogue LORIA/CNRS Vandoeuvre-lès-Nancy,
More informationUIMA-based Annotation Type System for a Text Mining Architecture
UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and
More informationQuestion Answering Using XML-Tagged Documents
Question Answering Using XML-Tagged Documents Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/trec11/index.html XML QA System P Full text processing of TREC top 20 documents Sentence
More informationSemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses
SemEval-2013 Task 13: Word Sense Induction for Graded and Non-Graded Senses David Jurgens Dipartimento di Informatica Sapienza Universita di Roma jurgens@di.uniroma1.it Ioannis Klapaftis Search Technology
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationDiscriminative Training with Perceptron Algorithm for POS Tagging Task
Discriminative Training with Perceptron Algorithm for POS Tagging Task Mahsa Yarmohammadi Center for Spoken Language Understanding Oregon Health & Science University Portland, Oregon yarmoham@ohsu.edu
More informationA Semantic Role Repository Linking FrameNet and WordNet
A Semantic Role Repository Linking FrameNet and WordNet Volha Bryl, Irina Sergienya, Sara Tonelli, Claudio Giuliano {bryl,sergienya,satonelli,giuliano}@fbk.eu Fondazione Bruno Kessler, Trento, Italy Abstract
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationUsing UIMA to Structure an Open Platform for Textual Entailment. Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University
Using UIMA to Structure an Open Platform for Textual Entailment Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University The paper is about About EXCITEMENT Open Platform a
More informationMigrating LINA Laboratory to Apache UIMA
Migrating LINA Laboratory to Apache UIMA Stegos Afantenos et Matthieu Vernier Équipe TALN - Laboratoire Informatique Nantes Atlantique Vendredi 10 Juillet 2009 Afantenos, Vernier (TALN - LINA) UIMA @ LINA
More informationError annotation in adjective noun (AN) combinations
Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationA platform for collaborative semantic annotation
A platform for collaborative semantic annotation Valerio Basile and Johan Bos and Kilian Evang and Noortje Venhuizen {v.basile,johan.bos,k.evang,n.j.venhuizen}@rug.nl Center for Language and Cognition
More informationA Hybrid Unsupervised Web Data Extraction using Trinity and NLP
IJIRST International Journal for Innovative Research in Science & Technology Volume 2 Issue 02 July 2015 ISSN (online): 2349-6010 A Hybrid Unsupervised Web Data Extraction using Trinity and NLP Anju R
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationCorpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002
Corpus methods for sociolinguistics Emily M. Bender bender@csli.stanford.edu NWAV 31 - October 10, 2002 Overview Introduction Corpora of interest Software for accessing and analyzing corpora (demo) Basic
More informationLanguage resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)
INTERNATIONAL STANDARD ISO 24617-8 First edition 2016-12-15 Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)
More informationA Multilingual Social Media Linguistic Corpus
A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th
More informationA Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet
A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch
More informationLIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases
LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring
More informationExperiences with UIMA in NLP teaching and research. Manuela Kunze, Dietmar Rösner
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing Overview What is UIMA? First Experiments NLP Teaching
More informationProject Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet.
Project Name The Eclipse Integrated Computational Environment Jay Jay Billings, ORNL 20140219 Parent Project None selected yet. Background The science and engineering community relies heavily on modeling
More informationBest practices in the design, creation and dissemination of speech corpora at The Language Archive
LREC Workshop 18 2012-05-21 Istanbul Best practices in the design, creation and dissemination of speech corpora at The Language Archive Sebastian Drude, Daan Broeder, Peter Wittenburg, Han Sloetjes The
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationAnnotation by category - ELAN and ISO DCR
Annotation by category - ELAN and ISO DCR Han Sloetjes, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 AH Nijmegen, The Netherlands E-mail: Han.Sloetjes@mpi.nl, Peter.Wittenburg@mpi.nl
More informationImplementing a Variety of Linguistic Annotations
Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationSemantic Web and Natural Language Processing
Semantic Web and Natural Language Processing Wiltrud Kessler Institut für Maschinelle Sprachverarbeitung Universität Stuttgart Semantic Web Winter 2014/2015 This work is licensed under a Creative Commons
More informationBabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network
BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network Roberto Navigli, Simone Paolo Ponzetto What is BabelNet a very large, wide-coverage multilingual
More informationDesign and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart
Design and Realization of the EXCITEMENT Open Platform for Textual Entailment Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart Textual Entailment Textual Entailment (TE) A Text (T) entails a
More informationMeaning Banking and Beyond
Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis
More informationMaca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology
Maca a configurable tool to integrate Polish morphological data Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Outline Morphological resources for Polish Tagset and segmentation differences
More informationL435/L555. Dept. of Linguistics, Indiana University Fall 2016
for : for : L435/L555 Dept. of, Indiana University Fall 2016 1 / 12 What is? for : Decent definition from wikipedia: Computer programming... is a process that leads from an original formulation of a computing
More informationDHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI
DHANALAKSHMI COLLEGE OF ENGINEERING, CHENNAI Department of Computer Science and Engineering IT6801 - SERVICE ORIENTED ARCHITECTURE Anna University 2 & 16 Mark Questions & Answers Year / Semester: IV /
More information3 Publishing Technique
Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach
More informationThe Multilingual Language Library
The Multilingual Language Library @ LREC 2012 Let s build it together! Nicoletta Calzolari with Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale
More informationRecent Developments in the Czech National Corpus
Recent Developments in the Czech National Corpus Michal Křen Charles University in Prague 3 rd Workshop on the Challenges in the Management of Large Corpora Lancaster 20 July 2015 Introduction of the project
More informationEuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates
EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates Alina Karakanta, Mihaela Vela, Elke Teich Department of Language Science and Technology, Saarland University Outline Introduction
More informationANNIS3 Multiple Segmentation Corpora Guide
ANNIS3 Multiple Segmentation Corpora Guide (For the latest documentation see also: http://korpling.github.io/annis) title: version: ANNIS3 Multiple Segmentation Corpora Guide 2013-6-15a author: Amir Zeldes
More informationContent Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.
Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything
More informationWikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Heather Simpson 1, Stephanie Strassel 1, Robert Parker 1, Paul McNamee
More informationOn a Java based implementation of ontology evolution processes based on Natural Language Processing
ITALIAN NATIONAL RESEARCH COUNCIL NELLO CARRARA INSTITUTE FOR APPLIED PHYSICS CNR FLORENCE RESEARCH AREA Italy TECHNICAL, SCIENTIFIC AND RESEARCH REPORTS Vol. 2 - n. 65-8 (2010) Francesco Gabbanini On
More informationUnit 3 Corpus markup
Unit 3 Corpus markup 3.1 Introduction Data collected using a sampling frame as discussed in unit 2 forms a raw corpus. Yet such data typically needs to be processed before use. For example, spoken data
More informationUnsupervised Semantic Parsing
Unsupervised Semantic Parsing Hoifung Poon Dept. Computer Science & Eng. University of Washington (Joint work with Pedro Domingos) 1 Outline Motivation Unsupervised semantic parsing Learning and inference
More informationINTERNATIONAL STANDARD
INTERNATIONAL STANDARD IEC 61360-2 Edition 2.1 2004-02 Edition 2:2002 consolidated with amendment 1:2003 Standard data element types with associated classification scheme for electric components Part 2:
More informationGet the most value from your surveys with text analysis
SPSS Text Analysis for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That s
More informationOrtolang Tools : MarsaTag
Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements
More informationRiMOM Results for OAEI 2009
RiMOM Results for OAEI 2009 Xiao Zhang, Qian Zhong, Feng Shi, Juanzi Li and Jie Tang Department of Computer Science and Technology, Tsinghua University, Beijing, China zhangxiao,zhongqian,shifeng,ljz,tangjie@keg.cs.tsinghua.edu.cn
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationINTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY
INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY A PATH FOR HORIZING YOUR INNOVATIVE WORK REVIEW PAPER ON IMPLEMENTATION OF DOCUMENT ANNOTATION USING CONTENT AND QUERYING
More informationLING203: Corpus. March 9, 2009
LING203: Corpus March 9, 2009 Corpus A collection of machine readable texts SJSU LLD have many corpora http://linguistics.sjsu.edu/bin/view/public/chltcorpora Each corpus has a link to a description page
More informationJanne Bondi johannessen, Anders Nøklestad, Joel Priestley and Kristin Hagen. WP5: Glossa Integration
Janne Bondi johannessen, Anders Nøklestad, Joel Priestley and Kristin Hagen WP5: Glossa Integration WP5 Glossa integration The current Glossa corpus interface and analysis tool will be integrated in the
More informationNLP Final Project Fall 2015, Due Friday, December 18
NLP Final Project Fall 2015, Due Friday, December 18 For the final project, everyone is required to do some sentiment classification and then choose one of the other three types of projects: annotation,
More informationArmy Research Laboratory
Army Research Laboratory Arabic Natural Language Processing System Code Library by Stephen C. Tratz ARL-TN-0609 June 2014 Approved for public release; distribution is unlimited. NOTICES Disclaimers The
More informationclarin:el an infrastructure for documenting, sharing and processing language data
clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use
More informationAnnotation and Evaluation
Annotation and Evaluation Digging into Data: Jordan Boyd-Graber University of Maryland April 15, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Annotation and Evaluation April 15, 2013 1 / 21 Exam Solutions
More informationCMDI and granularity
CMDI and granularity Identifier CLARIND-AP3-007 AP 3 Authors Dieter Van Uytvanck, Twan Goosen, Menzo Windhouwer Responsible Dieter Van Uytvanck Reference(s) Version Date Changes by State 1 2011-01-24 Dieter
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationTowards open-domain QA. Question answering. TReC QA framework. TReC QA: evaluation
Question ing Overview and task definition History Open-domain question ing Basic system architecture Watson s architecture Techniques Predictive indexing methods Pattern-matching methods Advanced techniques
More informationA Collaborative User-centered Approach to Fine-tune Geospatial
A Collaborative User-centered Approach to Fine-tune Geospatial Database Design Grira Joel Bédard Yvan Sboui Tarek 16 octobre 2012 6th International Workshop on Semantic and Conceptual Issues in GIS - SeCoGIS
More informationThe answer (circa 2001)
Question ing Question Answering Overview and task definition History Open-domain question ing Basic system architecture Predictive indexing methods Pattern-matching methods Advanced techniques? What was
More informationLet s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed
Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,
More informationPlagiarism Detection Using FP-Growth Algorithm
Northeastern University NLP Project Report Plagiarism Detection Using FP-Growth Algorithm Varun Nandu (nandu.v@husky.neu.edu) Suraj Nair (nair.sur@husky.neu.edu) Supervised by Dr. Lu Wang December 10,
More informationEnhancing Automatic Wordnet Construction Using Word Embeddings
Enhancing Automatic Wordnet Construction Using Word Embeddings Feras Al Tarouti University of Colorado Colorado Springs 1420 Austin Bluffs Pkwy Colorado Springs, CO 80918, USA faltarou@uccs.edu Jugal Kalita
More informationOntoNotes: A Unified Relational Semantic Representation
OntoNotes: A Unified Relational Semantic Representation Sameer S. Pradhan BBN Technologies Cambridge, MA 0138 Martha Palmer University of Colorado Boulder, CO 80309 Eduard Hovy ISI/USC Marina Del Rey,
More informationCf. Gasch (2008), chapter 1.2 Generic and project-specific XML Schema design", p. 23 f. 5
DGD 2.0: A Web-based Navigation Platform for the Visualization, Presentation and Retrieval of German Speech Corpora 1. Introduction 1.1 The Collection of German Speech Corpora at the IDS The "Institut
More informationAid to spatial navigation within a UIMA annotation index
Aid to spatial navigation within a UIMA annotation index Nicolas Hernandez LINA CNRS UMR 6241 University de Nantes Darmstadt, 3rd UIMA@GSCL Workshop, September 23, 2013 N. Hernandez Spatial navigation
More informationEUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet
EUDICO, Annotation and Exploitation of Multi Media Corpora over the Internet Hennie Brugman, Albert Russel, Daan Broeder, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500
More informationDeliverable D1.4 Report Describing Integration Strategies and Experiments
DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004 Report Describing
More information