Experiences with UIMA in NLP teaching and research. Manuela Kunze, Dietmar Rösner
|
|
- Mark Rodgers
- 5 years ago
- Views:
Transcription
1 Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing
2 Overview What is UIMA? First Experiments NLP Teaching Conclusion 2
3 UIMA: Unstructured Information Management Architecture a software architecture for developing and deploying unstructured information management (UIM) applications UIM application: a software system analyse large volumes of unstructured information to discover, organize, and deliver relevant knowledge to the end user software architecture which specifies component interfaces, data representations, 3
4 UIMA: Unstructured Information Management Architecture may be takes interfaces used a CAS, by a to Collection analyzes collection its Reader of contents, datato items populate and (e.g., produces a documents) CAS an fromenriched a document. to be An example CAS. analyzed. Analysis of a Collection CAS Engines Initializer Readers canis be an return recursively HTML CASes parser composed thatcontain de-tags of other an HTML documents Analysis Engines to document (called analyze, and also possibly Aggregate inserts along paragraph Analysis with additional Engine). annotations Aggregates metadata. (determined may also from contain <P> tags CAS in theconsumers. original HTML) into the CAS. CAS: Common Analysis Structure CPE: Collecting Processing Manager consume the enriched CAS that was produced by the sequence of Analysis Engines before it, and produce an application-specific data structure, such as a search engine index or database. [Ferucci et al.: Unstructured Information Management Architecture (UIMA): SDK User's Guide and Reference] 4
5 UIMA: Unstructured Information Management Architecture Analysis Engine (AE): a component that analyzes artifacts (e.g. documents) and infers information about them consists of two parts: Java classes (typically packaged as one or more JAR files) and AE descriptors (one or more XML files) the configuration settings for the Analysis Engine as well as a description of the AE s input and output requirements. 5
6 UIMA: Unstructured Information Management Architecture describe analysis engine: annotator class input parameter output of annotations external resources interface resources linked to a type system XML analysis engine define an annotator Java Annotator processing resources uses define annotation type: name features (begin, end, ) type system create Annotation Interface 6
7 UIMA: Unstructured Information Management Architecture Aggregate Analysis Engine: combine different analysis engine within one Analysis Engine [Ferucci et al.: Unstructured Information Management Architecture (UIMA): SDK User's Guide and Reference] 7
8 Overview Introduction First Experiments NLP Teaching Conclusion 8
9 First Experiments: UIMA vs. GATE base line: 2 persons, 2 systems, 1 corpus and 1 extraction task skills/experiences of the persons: UIMA GATE Eclipse/Java Person 1 Person 2 9
10 Task of the Experiment process a corpus of websites to detect and extract information relevant for tourists opening times of museum, prices of hotels, corpus: 30 tourism web sites of Egypt additional 20 web sites of Washington, New York, London output: Prolog facts for a reasoner Questions: Which museum is now open? 10
11 Evaluation Topics/Points ease of getting acquainted with system?: quality of docus: completeness, clarity, up-to-date,? tutorials, use cases,? processing and linguistic resources? lexica, Gazetteer lists, tools tools for resource maintenance and extension? quality: selfexplanatory, robust, comfortable speed of processing? single document vs. large corpora? limitations, suggestions for improvement? support for im-/export of a variety of document formats? 11
12 Excerpts from the Corpus The Egyptian Museum is open the hours: 9am-5pm daily The Military Museum is open the hours: Summer: 8am- 5:30pm; winter: 8am-4:30pm Palace Museum is open the hours: 8am-5:30pm (summer) 8am-4:30pm (winter) 10am-2pm, 6pm-9pm Sat-Wed; 6pm-9pm Fri 12
13 UIMA Application several annotators (like a pipeline) regular expressions... *Fraunces Tavern Museum* 54 Pearl St Tuesday-Friday, 12pm?5pm; restrictions time pattern museum pattern interval of Prolog facts: museumopen('fraunces Tavern Museum ', times ' T12:00:00',' T17:00:00'). museum museumopen('fraunces Tavern Museum information ', ' T12:00:00',' T17:00:00'). museumopen('fraunces window covering Tavern twomuseum time intervals ', and a ' T12:00:00',' T17:00:00'). restriction regular expressions window covering a museum and opening hours regular expressions 13
14 UIMA: Results information annotated in the documents: names of museums, hotels times, time intervals time restrictions prices, intervals of prices (hotel prices) keywords for museum category names of pharaohs (annotated with a correction of mispellings) information about hotel and museum are exported into Prolog facts and into a short textual summary templates filled with the detected information hotels: Price information about Cosmopolitan Hotel : $157 museums: *** *Fraunces Tavern Museum* *** Open from 12:00:00 to 17:00:00; Restriction: Tuesday-Friday 14
15 UIMA vs. GATE: Conclusion no final judgement about: use GATE or UIMA depends on your task task description expected results which processing resources are necessary your preferences for interface prefer the Eclispe environment (or other Java editors) prefer a comfortable GUI 15
16 GATE: tools available comfortable GUI UIMA vs. GATE: Conclusion UIMA: plain framework simplified definition of (complex) result structures simplified pre- and postprocessing of annotations both are extensible e.g. for processing German documents 16
17 'German' Extension of Processing Resources XDOC document suite tools for processing German documents tools implemented in CommonLisp for UIMA Java reimplementation of the tools several analysis engines 17
18 XDOC in UIMA annotation of part-of-speech (Morphix, heuristics) semantic categories named entities (vehicles, cities, ) a coarse approach for classification of PP using maxent library 18
19 UIMA: Evaluation documentation? - good processing and linguistic resources? tools for resource maintenance and extension? speed of processing? single docs vs. large corpora? - illustrative examples (tutorial) - completeness: sometimes it is very shortly described - experiences with Eclipse and Java programming are advantageous - prior knowledge about Java and Eclipse is helpful limitations, suggestions for improvement? im-/export of document formats? 19
20 UIMA: Evaluation documentation? processing and linguistic resources? tools for resource maintenance and extension? speed of processing? single docs vs. large corpora? limitations, suggestions for improvement? - annotators only from tutorial - sentence annotation - word annotation - date/time annotators - examples for using regular expressions etc. - external resources can be integrated: - lexical resources as external resources (text files) - existing processing resources - implementation of an interface is necessary im-/export of document formats? 20
21 UIMA: Evaluation documentation? processing and linguistic resources? - specific Eclipse component editors or - simple text editors tools for resource maintenance and extension? speed of processing? single docs vs. large corpora? limitations, suggestions for improvement? im-/export of document formats? 21
22 UIMA: Evaluation documentation processing and linguistic resources - faster than GATE? - in CPE detailed information about processing time for each module tools for resource maintenance and extension? speed of processing? single docs vs. large corpora? limitations, suggestions for improvement? im-/export of document formats? 22
23 UIMA: Evaluation documentation processing and linguistic resources - Collection Reader - document(s) from a directory tools for resource maintenance and extension? speed of processing? single docs vs. large corpora? limitations, suggestions for improvement? im-/export of document formats? 23
24 UIMA: Evaluation documentation processing and linguistic resources tools for resource maintenance and extension? no limitations: all is possible, but implementation or interfacing by user wish: more processing and linguistic resources within the distribution speed of processing? single docs vs. large corpora? limitations, suggestions for improvement? im-/export of document formats? 24
25 UIMA: Evaluation documentation processing and linguistic resources tools for resource maintenance and extension? speed of processing? single docs vs. large corpora? - import: CAS Initializer - export: CAS Consumer - transform annotations in any other format - export of - document + annotations - only annotations - required: Java application limitations, suggestions for improvement? im-/export of document formats? 25
26 Overview Introduction First Experiments NLP Teaching Conclusion 26
27 NLP Teaching course: Information Extraction aim of the course: to make our students acquainted with information extraction as basic NLP technology UIMA, GATE students: computer science, data-knowledge engineering skills of the students: programming Java 27
28 NLP Teaching different corpora: news about FIFA world cup 2006 in Germany, description of drugs, announcements of new books, tasks for students to develop different anaylsis engines and combine them for annotation of URLs, addresses, name of players, results of games, using regular expressions, external resources, maximum entropy models 28
29 NLP Teaching 29
30 UIMA: A Students View easy to handle Java programming (environment) problems of students: to understand the dependencies between the several descriptors for teaching helpful (future work): a 'comparator' of different solutions of students which solution is the best, related to a 'master' solution 30
31 Overview Introduction First Experiments NLP Teaching Conclusion 31
32 Conclusion UIMA: easy to learn and to handle support the management of different annotations different processing resources integration of external resources (processing resources as well lexical resources) splitting of 'processing steps': 'wish-list': reader, initalizer, analysis engine, consumer a kind of jape transducer interface to GATE's processing resources is available 'comparator' for evaluation of solutions 32
CSC 5930/9010: Text Mining GATE Developer Overview
1 CSC 5930/9010: Text Mining GATE Developer Overview Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 GATE Components 2 We will deal primarily with GATE Developer:
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationTutorial on Text Mining for the Going Digital initiative. Natural Language Processing (NLP), University of Essex
Tutorial on Text Mining for the Going Digital initiative Natural Language Processing (NLP), University of Essex 6 February, 2013 Topics of This Tutorial o Information Extraction (IE) o Examples of IE systems
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationUnstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki
Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki Overview What is UIMA? A framework for NLP tasks and tools Part-of-Speech Tagging Full Parsing Shallow Parsing
More informationA tool for Cross-Language Pair Annotations: CLPA
A tool for Cross-Language Pair Annotations: CLPA August 28, 2006 This document describes our tool called Cross-Language Pair Annotator (CLPA) that is capable to automatically annotate cognates and false
More informationImplementing a Variety of Linguistic Annotations
Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing
More informationText Mining for Software Engineering
Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationTopic Description Who % complete Comments. faqs Schor 100% Small updates, added hyperlinks
TestPlan2.1 Test Plan for UIMA Version 2.1 This page documents the planned testing for the 2.1 release. Test Schedule Testing is planned starting Jan 22, 2007, for approx. 2-4 weeks. Date(s) January 22
More informationUsing UIMA to Structure an Open Platform for Textual Entailment. Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University
Using UIMA to Structure an Open Platform for Textual Entailment Tae-Gil Noh, Sebastian Padó Dept. of Computational Linguistics Heidelberg University The paper is about About EXCITEMENT Open Platform a
More informationLIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases
LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring
More informationType of Submission: Article Title: Integrating UIMA Development into Watson Explorer Studio Subtitle: Fully utilizing the new Java Perspective
Type of Submission: Article Title: Integrating UIMA Development into Watson Explorer Studio Subtitle: Fully utilizing the new Java Perspective Keywords: UIMA, Watson Prefix: Given: Kameron Middle: Arthur
More informationAn UIMA based Tool Suite for Semantic Text Processing
An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationOn a Java based implementation of ontology evolution processes based on Natural Language Processing
ITALIAN NATIONAL RESEARCH COUNCIL NELLO CARRARA INSTITUTE FOR APPLIED PHYSICS CNR FLORENCE RESEARCH AREA Italy TECHNICAL, SCIENTIFIC AND RESEARCH REPORTS Vol. 2 - n. 65-8 (2010) Francesco Gabbanini On
More informationUIMA Tools Guide and Reference
UIMA Tools Guide and Reference Written and maintained by the Apache UIMA Development Community Version 3.0.0 Copyright 2006, 2018 The Apache Software Foundation License and Disclaimer. The ASF licenses
More informationA Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2
A Survey Of Different Text Mining Techniques Varsha C. Pande 1 and Dr. A.S. Khandelwal 2 1 Department of Electronics & Comp. Sc, RTMNU, Nagpur, India 2 Department of Computer Science, Hislop College, Nagpur,
More informationThis tutorial is designed for all Java enthusiasts who want to learn document type detection and content extraction using Apache Tika.
About the Tutorial This tutorial provides a basic understanding of Apache Tika library, the file formats it supports, as well as content and metadata extraction using Apache Tika. Audience This tutorial
More informationIBM Research Report. The Semantic Analysis Workbench (SAW): Towards a Framework for Knowledge Gathering and Synthesis
RC23738 (W0503-053) March 9, 2005 Computer Science IBM Research Report The Semantic Analysis Workbench (SAW): Towards a Framework for Knowledge Gathering and Synthesis Anthony Levas, Eric Brown, J. William
More informationUIMA Tools Guide and Reference
UIMA Tools Guide and Reference Written and maintained by the Apache UIMA Development Community Version 2.3.0-incubating Copyright 2004, 2006 International Business Machines Corporation Copyright 2006,
More informationInformatics 1: Data & Analysis
Informatics 1: Data & Analysis Lecture 9: Trees and XML Ian Stark School of Informatics The University of Edinburgh Tuesday 11 February 2014 Semester 2 Week 5 http://www.inf.ed.ac.uk/teaching/courses/inf1/da
More informationRPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???
@ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON
More informationModule 1: Information Extraction
Module 1: Information Extraction Introduction to GATE Developer The University of Sheffield, 1995-2014 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence About
More informationSTS Infrastructural considerations. Christian Chiarcos
STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationCACAO PROJECT AT THE 2009 TASK
CACAO PROJECT AT THE TEL@CLEF 2009 TASK Alessio Bosca, Luca Dini Celi s.r.l. - 10131 Torino - C. Moncalieri, 21 alessio.bosca, dini@celi.it Abstract This paper presents the participation of the CACAO prototype
More informationTokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017
Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation
More informationOutline. 1 Introduction. 2 Semantic Assistants: NLP Web Services. 3 NLP for the Masses: Desktop Plug-Ins. 4 Conclusions. Why?
Natural Language Processing for the Masses: The Semantic Assistants Project Outline 1 : Desktop Plug-Ins Semantic Software Lab Department of Computer Science and Concordia University Montréal, Canada 2
More informationMachine Learning in GATE
Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE Supervised learning Effort
More informationD4.6 Data Value Chain Database v2
D4.6 Data Value Chain Database v2 Coordinator: Fabrizio Orlandi (Fraunhofer) With contributions from: Isaiah Mulang Onando (Fraunhofer), Luis-Daniel Ibáñez (SOTON) Reviewer: Ryan Goodman (ODI) Deliverable
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationExtending the Facets concept by applying NLP tools to catalog records of scientific literature
Extending the Facets concept by applying NLP tools to catalog records of scientific literature *E. Picchi, *M. Sassi, **S. Biagioni, **S. Giannini *Institute of Computational Linguistics **Institute of
More informationParsing tree matching based question answering
Parsing tree matching based question answering Ping Chen Dept. of Computer and Math Sciences University of Houston-Downtown chenp@uhd.edu Wei Ding Dept. of Computer Science University of Massachusetts
More informationIntroduction to IE and ANNIE
Introduction to IE and ANNIE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. About this tutorial This tutorial comprises
More informationIntroducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS
Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from
More informationclarin:el an infrastructure for documenting, sharing and processing language data
clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use
More informationANC2Go: A Web Application for Customized Corpus Creation
ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationTeiid Designer User Guide 7.5.0
Teiid Designer User Guide 1 7.5.0 1. Introduction... 1 1.1. What is Teiid Designer?... 1 1.2. Why Use Teiid Designer?... 2 1.3. Metadata Overview... 2 1.3.1. What is Metadata... 2 1.3.2. Editing Metadata
More informationStory Workbench Quickstart Guide Version 1.2.0
1 Basic Concepts Story Workbench Quickstart Guide Version 1.2.0 Mark A. Finlayson (markaf@mit.edu) Annotation An indivisible piece of data attached to a text is called an annotation. Annotations, also
More informationAn XML-based document suite
An XML-based document suite Dietmar Rösner and Manuela Kunze Otto-von-Guericke-Universität Magdeburg Institut für Wissens- und Sprachverarbeitung P.O.box 4120, 39016 Magdeburg, Germany (roesner,makunze)@iws.cs.uni-magdeburg.de
More informationSTEPP Tagger 1. BASIC INFORMATION 2. TECHNICAL INFORMATION. Tool name. STEPP Tagger. Overview and purpose of the tool
1. BASIC INFORMATION Tool name STEPP Tagger Overview and purpose of the tool STEPP Tagger Part-of-speech tagger tuned to biomedical text. Given plain text, sentences and tokens are identified, and tokens
More informationReusability and Adaptability of Interactive Resources in Web-Based Educational Systems. 01/06/2003
Reusability and Adaptability of Interactive Resources in Web-Based Educational Systems 01/06/2003 ctchen@ctchen.idv.tw Reference A. El Saddik et al., Reusability and Adaptability of Interactive Resources
More informationInformation Retrieval CS Lecture 01. Razvan C. Bunescu School of Electrical Engineering and Computer Science
Information Retrieval CS 6900 Razvan C. Bunescu School of Electrical Engineering and Computer Science bunescu@ohio.edu Information Retrieval Information Retrieval (IR) is finding material of an unstructured
More informationWelcome New Client. Welcome to the family! keitaj.com
Welcome New Client Keitaj Images & Designs was founded in Southern, California - April of 2004 and now currently based in Bowie, Maryland. Whether this is your first website, or you want a redesign of
More informationUIMA Overview and Approach to Interoperability
U I M A UIMA IBM Research UIMA Overview and Approach to Interoperability www.ibm.com/research/uima Eric W. Brown IBM T.J. Watson Research Center 2007 IBM Corporation All Rights Reserved Analytics Bridge
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationState of the Art and Trends in Search Engine Technology. Gerhard Weikum
State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is
More informationEnterprise Data Catalog for Microsoft Azure Tutorial
Enterprise Data Catalog for Microsoft Azure Tutorial VERSION 10.2 JANUARY 2018 Page 1 of 45 Contents Tutorial Objectives... 4 Enterprise Data Catalog Overview... 5 Overview... 5 Objectives... 5 Enterprise
More informationContents. List of Figures. List of Tables. Acknowledgements
Contents List of Figures List of Tables Acknowledgements xiii xv xvii 1 Introduction 1 1.1 Linguistic Data Analysis 3 1.1.1 What's data? 3 1.1.2 Forms of data 3 1.1.3 Collecting and analysing data 7 1.2
More informationDeveloping Web Applications Using Microsoft Visual Studio 2008 SP1
Developing Web s Using Microsoft Visual Studio 2008 SP1 Introduction This five day instructor led course provides knowledge and skills on developing Web applications by using Microsoft Visual Studio 2008
More informationInformation Extraction with GATE
Information Extraction with GATE Angus Roberts Recap Installed and run GATE Language Resources LRs documents corpora Looked at annotations Processing resources PRs loading running Outline Introduction
More informationNatural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi)
Natural Language Processing SoSe 2015 Question Answering Dr. Mariana Neves July 6th, 2015 (based on the slides of Dr. Saeedeh Momtazi) Outline 2 Introduction History QA Architecture Outline 3 Introduction
More informationIntroduction to the Semantic Web
Introduction to the Semantic Web Charlie Abela Department of Artificial Intelligence charlie.abela@um.edu.mt Lecture Outline Course organisation Today s Web limitations Machine-processable data The Semantic
More informationPutting the Semantic In the Semantic Web
U I M A UIMA IBM Research Putting the Semantic In the Semantic Web An overview of UIMA and its role in Accelerating the Semantic Revolution Ontolog Forum, May 11, 2006 David A. Ferrucci Senior Manager,
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationTectoMT: Modular NLP Framework
: Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation
More informationNatural Language Processing SoSe Question Answering. (based on the slides of Dr. Saeedeh Momtazi) )
Natural Language Processing SoSe 2014 Question Answering Dr. Mariana Neves June 25th, 2014 (based on the slides of Dr. Saeedeh Momtazi) ) Outline 2 Introduction History QA Architecture Natural Language
More informationTextual Emigration Analysis
Textual Emigration Analysis Andre Blessing and Jonas Kuhn IMS - Universität Stuttgart, Germany clarin@ims.uni-stuttgart.de Abstract We present a web-based application which is called TEA (Textual Emigration
More informationComputer Support for the Analysis and Improvement of the Readability of IT-related Texts
Computer Support for the Analysis and Improvement of the Readability of IT-related Texts Matthias Holdorf, 23.05.2016, Munich Software Engineering for Business Information Systems (sebis) Department of
More informationFine-Grained Semantic Class Induction via Hierarchical and Collective Classification
Fine-Grained Semantic Class Induction via Hierarchical and Collective Classification Altaf Rahman and Vincent Ng Human Language Technology Research Institute The University of Texas at Dallas What are
More informationDetection and Extraction of Events from s
Detection and Extraction of Events from Emails Shashank Senapaty Department of Computer Science Stanford University, Stanford CA senapaty@cs.stanford.edu December 12, 2008 Abstract I build a system to
More informationIBM Advantage: IBM Watson Compare and Comply Element Classification
IBM Advantage: IBM Watson Compare and Comply Element Classification Executive overview... 1 Introducing Watson Compare and Comply... 2 Definitions... 3 Element Classification insights... 4 Sample use cases...
More informationA CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012
A CAS STUDY: Structure learning for Part-of-Speech Tagging Danilo Croce WM 2011/2012 27 gennaio 2012 TASK definition One of the tasks of VALITA 2009 VALITA is an initiative devoted to the evaluation of
More informationQANUS A GENERIC QUESTION-ANSWERING FRAMEWORK
QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK NG, Jun Ping National University of Singapore ngjp@nus.edu.sg 30 November 2009 The latest version of QANUS and this documentation can always be downloaded from
More informationUIMA Overview & SDK Setup
UIMA Overview & SDK Setup Written and maintained by the Apache UIMA Development Community Version 3.0.0-beta Copyright 2006, 2017 The Apache Software Foundation Copyright 2004, 2006 International Business
More informationNamed Entity Detection and Entity Linking in the Context of Semantic Web
[1/52] Concordia Seminar - December 2012 Named Entity Detection and in the Context of Semantic Web Exploring the ambiguity question. Eric Charton, Ph.D. [2/52] Concordia Seminar - December 2012 Challenge
More informationTutorial to QuotationFinder_0.4.3
Tutorial to QuotationFinder_0.4.3 What is Quotation Finder and for which purposes can it be used? Quotation Finder is a tool for the automatic comparison of fully digitized texts. It can either detect
More informationIsight Component Development 5.9
Isight Component Development 5.9 About this Course Course objectives Upon completion of this course you will be able to: Understand component requirements Develop component packages for Isight Targeted
More informationUIMA Overview & SDK Setup
UIMA Overview & SDK Setup Written and maintained by the Apache UIMA Development Community Version 2.7.0 Copyright 2006, 2015 The Apache Software Foundation Copyright 2004, 2006 International Business Machines
More informationONTOLOGY POPULATION: AN APPLICATION FOR THE E-TOURISM DOMAIN
International Journal of Innovative Computing, Information and Control ICIC International c 2011 ISSN 1349-4198 Volume 7, Number 11, November 2011 pp. 6115 6133 ONTOLOGY POPULATION: AN APPLICATION FOR
More informationComponents for Information Extraction: Ontology-Based Information Extractors and Generic Platforms
Components for Information Extraction: Ontology-Based Information Extractors and Generic Platforms Daya C. Wimalasuriya Computer and Information Science University of Oregon, USA dayacw@cs.uoregon.edu
More informationFast and Effective System for Name Entity Recognition on Big Data
International Journal of Computer Sciences and Engineering Open Access Research Paper Volume-3, Issue-2 E-ISSN: 2347-2693 Fast and Effective System for Name Entity Recognition on Big Data Jigyasa Nigam
More informationUsing GATE as an Environment for Teaching NLP
Using GATE as an Environment for Teaching NLP Kalina Bontcheva, Hamish Cunningham, Valentin Tablan, Diana Maynard, Oana Hamza Department of Computer Science University of Sheffield Sheffield, S1 4DP, UK
More informationChapter 17 Creating Online Pages and Sites
Lesson Plans for Chapter 17 1 Chapter 17 Creating Online Pages and Sites Chapter Objectives Discuss the Chapter 17 objectives with students: Learn about the creation of the World Wide Web. Write HTML tags
More informationUnstructured Information Processing with Apache UIMA. Computers Playing Jeopardy! Course Stony Brook University
Unstructured Information Processing with Apache UIMA Computers Playing Jeopardy! Course Stony Brook University What is UIMA? UIMA is a framework, a means to integrate text or other unstructured information
More informationIntroduction to Text Mining. Hongning Wang
Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:
More informationUser Guide. Schmooze Com Inc.
Schmooze Com Inc. Chapters Overview Main Landing Page Add a Reminder Manage Recipients Reporting Overview The Appointment Reminder module is a unique way to automate Appointment Reminders. By simply specifying
More informationEMC Documentum Composer
EMC Documentum Composer Version 6.0 SP1.5 User Guide P/N 300 005 253 A02 EMC Corporation Corporate Headquarters: Hopkinton, MA 01748 9103 1 508 435 1000 www.emc.com Copyright 2008 EMC Corporation. All
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationTECHNICAL BRIEFING PIMCORE TECHNOLOGY BRIEFING DOCUMENT Pimcore s backend system is displayed and navigated as Documents, Assets and Objects that solves the challenges of digital transformation. Pimcore
More informationAn Adaptive Framework for Named Entity Combination
An Adaptive Framework for Named Entity Combination Bogdan Sacaleanu 1, Günter Neumann 2 1 IMC AG, 2 DFKI GmbH 1 New Business Department, 2 Language Technology Department Saarbrücken, Germany E-mail: Bogdan.Sacaleanu@im-c.de,
More informationCHAPTER 6. Organizing Your Development Project. All right, guys! It s time to clean up this town!
CHAPTER 6 Organizing Your Development Project All right, guys! It s time to clean up this town! Homer Simpson In this book we describe how to build applications that are defined by the J2EE specification.
More informationLanguage Resources and Linked Data
Integrating NLP with Linked Data: the NIF Format Milan Dojchinovski @EKAW 2014 November 24-28, 2014, Linkoping, Sweden milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk Web Intelligence Research
More informationPrecise Medication Extraction using Agile Text Mining
Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,
More informationRecovering Traceability Links between an API and Its Learning Resources
Recovering Traceability Links between an API and Its Learning Resources Barthélémy Dagenais and Martin P. Robillard School of Computer Science McGill University Montréal, QC, Canada {bart,martin}@cs.mgill.ca
More informationThe Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T
The Muc7 T Corpus Katrin Tomanek and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany {katrin.tomanek udo.hahn}@uni-jena.de 1 Introduction
More informationDomain Independent Knowledge Base Population From Structured and Unstructured Data Sources
Domain Independent Knowledge Base Population From Structured and Unstructured Data Sources Michelle Gregory, Liam McGrath, Eric Bell, Kelly O Hara, and Kelly Domico Pacific Northwest National Laboratory
More informationREPROTOOL Workflow (Textual documents in SW development) D3S Seminar
REPROTOOL Workflow (ual documents in SW development) D3S Seminar 2011-04-27 http://d3s.mff.cuni.cz Viliam Šimko simko@d3s.mff.cuni.cz CHARLES UNIVERSITY IN PRAGUE faculty of mathematics and physics ual
More informationInstant Content Creator. User Guide
Instant Content Creator User Guide Table of contents: 1 INTRODUCTION...4 1.1 Installation Procedure...4 2 INSTANT CONTENT CREATOR INTERFACE...7 3 CREATING A NEW PROJECT...9 4 ENTERING THE NAME OF THE PRODUCT...10
More informationReproducible & Transparent Computational Science with Galaxy. Jeremy Goecks The Galaxy Team
Reproducible & Transparent Computational Science with Galaxy Jeremy Goecks The Galaxy Team 1 Doing Good Science Previous talks: performing an analysis setting up and scaling Galaxy adding tools libraries
More informationFor convenience in typing examples, we can shorten the wordnet name to wn.
NLP Lab Session Week 14, December 4, 2013 More Semantics: WordNet similarity in NLTK and LDA Mallet demo More on Final Projects: weka memory and loading Spam documents Getting Started For the final projects,
More informationClearTK Tutorial. Steven Bethard. Mon 11 Jun University of Colorado Boulder
ClearTK Tutorial Steven Bethard University of Colorado Boulder Mon 11 Jun 2012 What is ClearTK? Framework for machine learning in UIMA components Feature extraction from CAS Common classifier interface
More informationTutorial to QuotationFinder_0.4.4
Tutorial to QuotationFinder_0.4.4 What is Quotation Finder and for which purposes can it be used? Quotation Finder is a tool for the automatic comparison of fully digitized texts. It can detect quotations,
More informationBuilding the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format
Building the Multilingual Web of Data Integrating NLP with Linked Data and RDF using the NLP Interchange Format Presenter name 1 Outline 1. Introduction 2. NIF Basics 3. NIF corpora 4. NIF tools & services
More informationEclipse Support for Using Eli and Teaching Programming Languages
Electronic Notes in Theoretical Computer Science 141 (2005) 189 194 www.elsevier.com/locate/entcs Eclipse Support for Using Eli and Teaching Programming Languages Anthony M. Sloane 1,2 Department of Computing
More informationIn this tutorial, we will understand how to use the OpenNLP library to build an efficient text processing service.
About the Tutorial Apache OpenNLP is an open source Java library which is used process Natural Language text. OpenNLP provides services such as tokenization, sentence segmentation, part-of-speech tagging,
More informationAid to spatial navigation within a UIMA annotation index
Aid to spatial navigation within a UIMA annotation index Nicolas Hernandez Université de Nantes Abstract. In order to support the interoperability within UIMA workflows, we address the problem of accessing
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More information