University of Sheffield, NLP GATE: Bridging the Gap between Terminology and Linguistics
|
|
- Naomi Washington
- 6 years ago
- Views:
Transcription
1 GATE: Bridging the Gap between Terminology and Linguistics Diana Maynard University of Sheffield, UK
2 Why do terminologists need GATE? Terminologists face the problem of lack of suitable tools to process their data. Lots of in-house tools for doing individual things Lack of common tools that can be used collaboratively and across different systems and domains. Tools must be flexible, robust and able to adapt to different processing tasks and languages GATE and its components are a key tool in today's world of information and data overload Enable users to perform tasks such as document management, business intelligence, information retrieval, question answering, and knowledge indexing, modelling and conceptualisation.
3 GATE can help terminologists: Save time and money on management of text and data from multiple sources Find hidden links scattered across huge volumes of diverse information Integrate structured data from variety of sources Interlink text and data Collect information and extract new facts
4 A vision for text mining It is difficult to access unstructured information efficiently IE automates extraction of facts from text at reasonable accuracy and cost, increasing the value and utility of unstructured content Interlinking of text and data enables more efficient search, navigation and querying Text analysis is a matter of engineering: GATE offers practical solutions able to match specific requirements
5 Threat tracking application
6 Text mining and semantic annotation Extract structured data from text by Linking references to entities Linking entities to their semantic descriptions Automatic semantic annotation based on IE technology Attaches metadata to documents, which can be used for searching and hyperlinking Adds value to content of libraries, enabling user interaction with content Enhanced capability for cross-referencing and dynamic document classification
7 Semantic Annotation
8 Semantic Annotation of Entities Recognition of the type of the entities in the text from a rich taxonomy of classes Reference to their semantic description. Traditional NE recognition approach results in: <Person>Lama Ole Nydahl</Person> Semantic Annotation of NEs results in: <ReligiousPerson ID= > Lama Ole Nydahl </ReligiousPerson>
9 GATE: the Swiss Army Knife of NLP Has an attachment for almost every eventuality Some are hard to prise open Some are useful, but you might have to put up with a bit of clunkiness in practice Some will only be useful once in a lifetime, but you're glad to have them just in case. There are many imitations, but nothing like the real thing.
10 History of GATE early 1990s: you want me to write that all over again? : first GATE (and "large-scale IE") project 1996: GATE 1: Tcl/Tk, Perl, C++, : release of completely rewritten version 2, 100% Java 2009: mature ecosystem with established community Tens of thousands of research users 25,000 downloads per year commercial users getting serious
11 GATE is very eco-friendly!
12 GATE commercial users Typical commercial uses: dynamic search and indexing of repositories finding relations between elements in distributed repositories aggregating information from different text sources populating repositories fact finding from distributed knowledge sources Typical users: Pharmaceutics, news, intelligence (business, competitor, government, etc.), manufacturing, telecommunications
13
14
15
16
17
18
19
20
21 So what exactly is GATE? An architecture: A macro-level organisational picture for HLT software systems. A framework: For programmers, GATE is an object-oriented class library that implements the architecture. A development environment: For language engineers, computational linguists et al, a graphical development environment. A community of users and contributors
22 Architectural principles Non-prescriptive, theory neutral (strength and weakness) Re-use, interoperation, not reimplementation (e.g. diverse XML support, integration of Protégé, Jena, Yale...) (Almost) everything is a component, and component sets are user-extendable (Almost) all operations are available both from API and GUI
23 In short GATE includes: components for language processing, e.g. parsers, machine learning tools, stemmers, IR tools, IE components for various languages... tools for visualising and manipulating text, annotations, ontologies, parse trees, etc. various information extraction tools evaluation and benchmarking tools
24 Algorithms + Data + GUI = Applications GATE components are one of three types: Language Resources (LRs), e.g. lexicons, corpora, ontologies Processing Resources (PRs), e.g. parsers, generators, taggers Visual Resources (VRs), i.e. visualisation and editing components Algorithms are separated from the data, which means: the two can be developed independently by users with different expertise. alternative resources of one type can be used without affecting the other, e.g. a different visual resource can be used with the same language resource
25 But isn t GATE just about IE? Many people think of GATE as an IE tool IE is its primary function, but it also does a lot more Pretty much kind of linguistic processing can be done in GATE The only field we really don't cover is Machine Translation, but you could easily add components for that if you wanted More about the other functionality later, but now back to IE...
26 Two Approaches to IE Knowledge Engineering rule based developed by experienced language engineers make use of human intuition obtain marginally better performance development could be very time consuming some changes may be hard to accommodate Learning Systems use statistics or other machine learning developers do not need LE expertise requires large amounts of annotated training data some changes may require re-annotation of the entire training corpus
27 Named Entity Recognition Named Entity recognition is the cornerstone of IE Identification of proper names in texts, and classification into a set of predefined categories of interest. Three universally accepted categories: person, location and organisation Other common tasks: recognition of date/time expressions, measures (percent, money, weight etc), addresses etc. Other domain-specific entities: names of drugs, medical conditions, names of ships, bibliographic references etc.
28 ANNIE ANNIE is GATE's rule-based IE system It uses the language engineering approach (though we also have tools in GATE for ML) Distributed as part of GATE Uses a finite-state pattern-action rule language, JAPE More on JAPE later... ANNIE contains a reusable and easily extendable set of components: generic preprocessing components for tokenisation, sentence splitting etc components for performing NE on general open domain text
29 ANNIE Modules
30 Unicode Tokeniser Bases tokenisation on Unicode character classes Language-independent tokenisation Declarative token specification language, e.g.: "UPPERCASE_LETTER" LOWERCASE_LETTER"* > Token; orthography=upperinitial; kind=word Identifies words, numbers, spaces, different classes of punctuation, orthography Recognition deliberately basic so that more powerful tools (JAPE) can be used for finer distinctions greater reuse possibilities
31 Gazetteer Set of lists compiled into Finite State Machines 60k entries in 80 types List entries are matched in the text as Lookup annotations Each list has some pre-defined features, which enable different kinds of matches to be identified Additional arbitrary features and values can be added to individual list entries Entries can be matched according to root forms, or more flexibly based on e.g. edit distance
32
33 Limitations of gazetteers Gazetteer lists are designed for annotating simple, regular features Some flexibility is provided, but this is not enough for most tasks Recognising addresses using just a gazetteer would be impossible But combined with other linguistic pre-processing results, we have a whole lot of annotations and features POS tags, capitalisation, punctuation, lookup features, etc can all be combined to form patterns suggesting more complex information Luckily, we have JAPE to take care of this.
34 What is JAPE? a Jolly and Pleasant Experience Specially developed pattern matching language for GATE Each JAPE rule consists of LHS which contains patterns to match RHS which details the annotations (and optionally features) to be created JAPE rules combine to create a phase Rule priority based on pattern length, rule status and rule ordering Phases combine to create a grammar
35 Named Entity Grammars Hand-coded rules written in JAPE applied to annotations to identify NEs Phases run sequentially and constitute a cascade of FSTs over annotations Annotations from format analysis, tokeniser. splitter, POS tagger, morphological analysis, gazetteer etc. Because phases are sequential, annotations can be built up over a period of phases, as new information is gleaned Standard named entities: persons, locations, organisations, dates, addresses, money Basic NE grammars can be adapted for new applications, domains and languages
36 JAPE example University of Sheffield Rule: nameduniversity ( {Token.string == "University"} {Token.string == "of"} ({Lookup.minorType == city} ({Token.category == NNP})+ ) ):orgname --> :orgname.organisation = {kind = "university", rule = "nameduniversity"} Looks for specific words University of followed by: city name from gazetteer, or one or more proper nouns
37 Combining existing annotations Associate a company with a share price e.g. Whitbread shares closed up 2p at 645p. Phase: Shares Input: Token Organization Lookup Money Percent Options: control = appelt Rule:ShareChange ( {Organization} ({Token})[0,3] {Lookup.majorType=="change"} ({Token})[0,3] ({Money} {Percent}) ):change --> :change.sharechange = {rule = "ShareChange"}
38 Orthomatcher Orthographic coreference between annotations in the same document, e.g. Mr Brown, James Brown Matching rules are invoked between annotations of the same type, or between an existing annotation and an Unknown annotation The latter is the only case where an annotation type can be changed Lookup tables of aliases and exceptions (i.e. overriding of matching rules) Also PRs for pronominal and nominal coreference
39 What about other languages? Since we're based in Sheffield, you can't blame us for developing GATE primarily for English But contrary to popular belief about the British, we don't hate all foreigners! And we have lots of capabilities for processing in other languages Currently systems for English, French, German, Romanian, Bulgarian, Russian, Cebuano, Hindi, Chinese, Arabic You have a POS tagger for Swahili? Just add it as a plugin and combine it with existing tokeniser etc.
40 It's all Chinese to me...
41 Processing multiple languages If you have a language identifier PR, you can combine processing of texts in different languages in a single application The system will choose the right PRs for each document or document section Conditional application fires a PR if some condition is met
42 Other plugins Parsers (Stanford, MiniPar, RASP, SUPPLE) More flexible gazetteers Specialised NE (Chemistry, Biomedicine, etc) PRs for other languages, Alignment Lemmatisers, morphological analyser, NP and VP chunkers Machine Learning Evaluation toolkit including IAA IR, Google and Yahoo search engines, web crawlers WordNet Whole host of ontology-based tools
43 Alignment plugin
44 GATE in use We have dozens of applications, not all just research projects! A few examples...
45 Semantic Annotation Adding information to documents that is usable by machines to enable better presentation, navigation or searching, e.g. Perseus:
46
47 Indexing news at the BBC BBC Archives: 'Newsnight' archiving time is 8 hours per hour Automatic transcription to extract some potential indexing terms Result: temporally precise, but very noisy data Partial solution: search the web, intranet, digital library for related pages, and process with IE/SA Result: less noisy but temporally imprecise So we merge this information with the speech signal data Result: works well for easy stuff (high precision, low recall)
48
49 Ontology linking at FAO FAO have sets of fisheries-related ontologies, e.g. Gear, species, fishing areas No way to link between them using ontology alignment techniques, because we require information external to the ontology (fish lives in a particular area) NLP techniques make use of information from documents which provide this missing link Not always an exact match between text and the ontology elements, e.g. Mummichogs vs. fundulus heteroclitus Use techniques such as headword matching, noun phrase chunking, synonym and acronym finding, etc Find relations in the text to link the entities together
50 Ontology linking at FAO Fishing Gear Fishing Area caught_by found_in Species basis_of Commodities
51 Matching text descriptions Find NPs and terms; use OntoRootGazetteer to find morphological variants of ontology elements, perform headword and synonym matching etc. Pelagic species, mainly fish and cephalopds, northern shrimp (also small crustaceans, krill Match text span to ontology instance, retaining URIs Create annotations and features, e.g. caught_by = {gear_type = midwater otter trawls target_species = cephalopods} Convert to RDF triples
52
53 Using ANNIC to view results 53
54 Outsmarting our competitors
55 If you can't beat 'em, join 'em UIMA OpenCalais Lingpipe All integrated into GATE as plugins
56 UIMA UIMA is an NL engineering platform developed by IBM Shares some functionality with GATE, but is complementary in most respects. Interoperability layer has been developed to allow UIMA applications to be run within GATE, and vice versa, in order to combine elements of both. Emphasis is on architectural support, including asynchronous scaleout (deploying many copies of an application in parallel) Much narrower range of resources provided than GATE
57 OpenCalais Web service for semantic annotation of text. The user submits a document to the web service, which returns entity and relations annotations in RDF, JSON or some other format. Typically, users integrate OpenCalais annotation of their web pages to provide additional links and semantic functionality. OpenCalais annotates both relations and entities, although the GATE plugin only supports entities.
58 LingPipe Provides set of IE and data mining tools largely MLbased. Has a set of models trained for particular tasks/corpora. Limited ontology support: can connect entities found to databases and ontologies Advantage: ML models can suggest more than one output, ranked by confidence. The user can choose number of suggestions generated. Disadvantage: ML models only apply to specific tasks and domains.
59 In summary... We like to think GATE is the best thing since sliced bread for most NLP and terminology tasks You can use it for plenty of other things too, don't let us stop you being creative! Incorporates huge number of plugins, is easily extendable and highly customisable The only limit is your imagination... So if you're now convinced you can't live without GATE, there are two possibilities: ask us to get involved with a project try GATE yourself
60 Get your own hands dirty We run 3x yearly training courses in Sheffield and other selected locations Different tracks available GATE certification available
61 More info, contact details, demos, publications: Now it's time to nudge your neighbour if they are asleep... Or ask that burning question about GATE.
BD003: Introduction to NLP Part 2 Information Extraction
BD003: Introduction to NLP Part 2 Information Extraction The University of Sheffield, 1995-2017 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. Contents This
More informationCSC 5930/9010: Text Mining GATE Developer Overview
1 CSC 5930/9010: Text Mining GATE Developer Overview Dr. Paula Matuszek Paula.Matuszek@villanova.edu Paula.Matuszek@gmail.com (610) 647-9789 GATE Components 2 We will deal primarily with GATE Developer:
More informationText Mining for Software Engineering
Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software
More informationIntroduction to IE and ANNIE
Introduction to IE and ANNIE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. About this tutorial This tutorial comprises
More informationInformation Extraction with GATE
Information Extraction with GATE Angus Roberts Recap Installed and run GATE Language Resources LRs documents corpora Looked at annotations Processing resources PRs loading running Outline Introduction
More informationModule 3: Introduction to JAPE
Module 3: Introduction to JAPE The University of Sheffield, 1995-2010 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence About this tutorial As in previous modules,
More informationIntroduction to Information Extraction (IE) and ANNIE
Module 1 Session 2 Introduction to Information Extraction (IE) and ANNIE The University of Sheffield, 1995-2015 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence.
More informationImplementing a Variety of Linguistic Annotations
Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing
More informationModule 10: Advanced GATE Applications
Module 10: Advanced GATE Applications The University of Sheffield, 1995-2010 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence About this tutorial This tutorial
More informationLarge Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood
Large Scale Semantic Annotation, Indexing, and Search at The National Archives Diana Maynard Mark Greenwood University of Sheffield, UK 1 Burning questions you may have... In the last 3 years, which female
More informationModule 1: Information Extraction
Module 1: Information Extraction Introduction to GATE Developer The University of Sheffield, 1995-2014 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence About
More informationMachine Learning in GATE
Machine Learning in GATE Angus Roberts, Horacio Saggion, Genevieve Gorrell Recap Previous two days looked at knowledge engineered IE This session looks at machine learned IE Supervised learning Effort
More informationTutorial on Text Mining for the Going Digital initiative. Natural Language Processing (NLP), University of Essex
Tutorial on Text Mining for the Going Digital initiative Natural Language Processing (NLP), University of Essex 6 February, 2013 Topics of This Tutorial o Information Extraction (IE) o Examples of IE systems
More information97 Information Technology with Audiovisual and Multimedia and National Libraries (part 2) No
Date : 25/05/2006 Towards Constructing a Chinese Information Extraction System to Support Innovations in Library Services Zhang Zhixiong, Li Sa, Wu Zhengxin, Lin Ying The library of Chinese Academy of
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationAn UIMA based Tool Suite for Semantic Text Processing
An UIMA based Tool Suite for Semantic Text Processing Katrin Tomanek, Ekaterina Buyko, Udo Hahn Jena University Language & Information Engineering Lab StemNet Knowledge Management for Immunology in life
More informationOn a Java based implementation of ontology evolution processes based on Natural Language Processing
ITALIAN NATIONAL RESEARCH COUNCIL NELLO CARRARA INSTITUTE FOR APPLIED PHYSICS CNR FLORENCE RESEARCH AREA Italy TECHNICAL, SCIENTIFIC AND RESEARCH REPORTS Vol. 2 - n. 65-8 (2010) Francesco Gabbanini On
More informationModule 2: Introduction to IE and ANNIE
Module 2: Introduction to IE and ANNIE The University of Sheffield, 1995-2010 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. About this tutorial This tutorial
More informationUsing GATE as an Environment for Teaching NLP
Using GATE as an Environment for Teaching NLP Kalina Bontcheva, Hamish Cunningham, Valentin Tablan, Diana Maynard, Oana Hamza Department of Computer Science University of Sheffield Sheffield, S1 4DP, UK
More informationSTS Infrastructural considerations. Christian Chiarcos
STS Infrastructural considerations Christian Chiarcos chiarcos@uni-potsdam.de Infrastructure Requirements Candidates standoff-based architecture (Stede et al. 2006, 2010) UiMA (Ferrucci and Lally 2004)
More informationDeveloping Language Processing Components with GATE Version 4 (a User Guide)
Developing Language Processing Components with GATE Version 4 (a User Guide) For GATE version 4.0-beta1 (April 2007) (built April 24, 2007) Hamish Cunningham Diana Maynard Kalina Bontcheva Valentin Tablan
More informationOutline. 1 Introduction. 2 Semantic Assistants: NLP Web Services. 3 NLP for the Masses: Desktop Plug-Ins. 4 Conclusions. Why?
Natural Language Processing for the Masses: The Semantic Assistants Project Outline 1 : Desktop Plug-Ins Semantic Software Lab Department of Computer Science and Concordia University Montréal, Canada 2
More informationUnstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki
Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki Overview What is UIMA? A framework for NLP tasks and tools Part-of-Speech Tagging Full Parsing Shallow Parsing
More informationParmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge
Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which
More informationOrtolang Tools : MarsaTag
Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements
More informationModule 3: GATE and Social Media. Part 4. Named entities
Module 3: GATE and Social Media Part 4. Named entities The 1995-2018 This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs Licence Named Entity Recognition Texts frequently
More informationD4.6 Data Value Chain Database v2
D4.6 Data Value Chain Database v2 Coordinator: Fabrizio Orlandi (Fraunhofer) With contributions from: Isaiah Mulang Onando (Fraunhofer), Luis-Daniel Ibáñez (SOTON) Reviewer: Ryan Goodman (ODI) Deliverable
More informationCustomisable Curation Workflows in Argo
Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:
More informationCHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS
82 CHAPTER 5 SEARCH ENGINE USING SEMANTIC CONCEPTS In recent years, everybody is in thirst of getting information from the internet. Search engines are used to fulfill the need of them. Even though the
More informationState of the Art and Trends in Search Engine Technology. Gerhard Weikum
State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is
More informationclarin:el an infrastructure for documenting, sharing and processing language data
clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use
More informationAdvanced GATE Applications
Advanced GATE Applications The University of Sheffield, 1995-2015 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence Topics covered This module is about adapting
More informationIntroducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS
Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from
More informationExperiences with UIMA in NLP teaching and research. Manuela Kunze, Dietmar Rösner
Experiences with UIMA in NLP teaching and research Manuela Kunze, Dietmar Rösner University of Magdeburg C Knowledge Based Systems and Document Processing Overview What is UIMA? First Experiments NLP Teaching
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationA bit of theory: Algorithms
A bit of theory: Algorithms There are different kinds of algorithms Vector space models. e.g. support vector machines Decision trees, e.g. C45 Probabilistic models, e.g. Naive Bayes Neural networks, e.g.
More informationRPI INSIDE DEEPQA INTRODUCTION QUESTION ANALYSIS 11/26/2013. Watson is. IBM Watson. Inside Watson RPI WATSON RPI WATSON ??? ??? ???
@ INSIDE DEEPQA Managing complex unstructured data with UIMA Simon Ellis INTRODUCTION 22 nd November, 2013 WAT SON TECHNOLOGIES AND OPEN ARCHIT ECT URE QUEST ION ANSWERING PROFESSOR JIM HENDLER S IMON
More informationA tool for Cross-Language Pair Annotations: CLPA
A tool for Cross-Language Pair Annotations: CLPA August 28, 2006 This document describes our tool called Cross-Language Pair Annotator (CLPA) that is capable to automatically annotate cognates and false
More informationOwlExporter. Guide for Users and Developers. René Witte Ninus Khamis. Release 1.0-beta2 May 16, 2010
OwlExporter Guide for Users and Developers René Witte Ninus Khamis Release 1.0-beta2 May 16, 2010 Semantic Software Lab Concordia University Montréal, Canada http://www.semanticsoftware.info Contents
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationLanguages and tools for building and using ontologies. Simon Jupp, James Malone
An overview of ontology technology Languages and tools for building and using ontologies Simon Jupp, James Malone jupp@ebi.ac.uk, malone@ebi.ac.uk Outline Languages OWL and OBO classes, individuals, relations,
More informationBuilding the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format
Building the Multilingual Web of Data Integrating NLP with Linked Data and RDF using the NLP Interchange Format Presenter name 1 Outline 1. Introduction 2. NIF Basics 3. NIF corpora 4. NIF tools & services
More informationNERD workshop. Luca ALMAnaCH - Inria Paris. Berlin, 18/09/2017
NERD workshop Luca Foppiano @ ALMAnaCH - Inria Paris Berlin, 18/09/2017 Agenda Introducing the (N)ERD service NERD REST API Usages and use cases Entities Rigid textual expressions corresponding to certain
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationUniversity of Sheffield, NLP. Chunking Practical Exercise
Chunking Practical Exercise Chunking for NER Chunking, as we saw at the beginning, means finding parts of text This task is often called Named Entity Recognition (NER), in the context of finding person
More informationTectoMT: Modular NLP Framework
: Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation
More informationPerformance Assessment using Text Mining
Performance Assessment using Text Mining Mrs. Radha Shakarmani Asst. Prof, SPIT Sardar Patel Institute of Technology Munshi Nagar, Andheri (W) Mumbai - 400 058 Nikhil Kedar Student, SPIT 903, Sai Darshan
More informationUniversity of Sheffield, NLP. Chunking Practical Exercise
Chunking Practical Exercise Chunking for NER Chunking, as we saw at the beginning, means finding parts of text This task is often called Named Entity Recognition (NER), in the context of finding person
More informationEnhancing applications with Cognitive APIs IBM Corporation
Enhancing applications with Cognitive APIs After you complete this section, you should understand: The Watson Developer Cloud offerings and APIs The benefits of commonly used Cognitive services 2 Watson
More informationAn Entity Name Systems (ENS) for the [Semantic] Web
An Entity Name Systems (ENS) for the [Semantic] Web Paolo Bouquet University of Trento (Italy) Coordinator of the FP7 OKKAM IP LDOW @ WWW2008 Beijing, 22 April 2008 An ordinary day on the [Semantic] Web
More informationAnnotating Spatio-Temporal Information in Documents
Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de
More informationAdvanced JAPE. Module 1. June 2017
Advanced JAPE Module 1 June 2017 c 2017 The University of Sheffield This material is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence (http://creativecommons.org/licenses/by-nc-sa/3.0/)
More informationANC2Go: A Web Application for Customized Corpus Creation
ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu
More informationJENA: A Java API for Ontology Management
JENA: A Java API for Ontology Management Hari Rajagopal IBM Corporation Page Agenda Background Intro to JENA Case study Tools and methods Questions Page The State of the Web Today The web is more Syntactic
More informationNatural Language Interfaces to Ontologies. Danica Damljanović
Natural Language Interfaces to Ontologies Danica Damljanović danica@dcs.shef.ac.uk Sponsored by Transitioning Applications to Ontologies: www.tao-project.eu GATE case study in TAO project collect software
More informationLIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases
LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring
More informationHistorical Text Mining:
Historical Text Mining Historical Text Mining, and Historical Text Mining: Challenges and Opportunities Dr. Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liv.ac.uk http://www.csc.liv.ac.uk/~azaroth/
More informationSemantic Annotation, Search and Analysis
Semantic Annotation, Search and Analysis Borislav Popov, Ontotext Ontology A machine readable conceptual model a common vocabulary for sharing information machine-interpretable definitions of concepts in
More informationAutomatic Ontology-Based Document Annotation for Arabic Information Retrieval
Islamic University-Gaza Deanery of Graduate Studies Faculty of Information Technology الجامعة اإلسالمية- غزة عمادة الدزاسات العليا كلية تكىىلىجيا المعلىمات بسم اهلل الرحمن الرحيم Automatic Ontology-Based
More informationTeamware: A Collaborative, Web-based Annotation Environment. Kalina Bontcheva, Milan Agatonovic University of Sheffield
Teamware: A Collaborative, Web-based Annotation Environment Kalina Bontcheva, Milan Agatonovic University of Sheffield Outline Why Teamware? What s Teamware? Teamware for annotation Teamware for quality
More informationUniversity of Sheffield, NLP Annotation and Evaluation
Annotation and Evaluation Diana Maynard, Niraj Aswani University of Sheffield Topics covered Defining annotation guidelines Manual annotation using the GATE GUI Annotation schemas and how they change the
More informationStatistical Parsing for Text Mining from Scientific Articles
Statistical Parsing for Text Mining from Scientific Articles Ted Briscoe Computer Laboratory University of Cambridge November 30, 2004 Contents 1 Text Mining 2 Statistical Parsing 3 The RASP System 4 The
More informationLanguage Resources and Linked Data
Integrating NLP with Linked Data: the NIF Format Milan Dojchinovski @EKAW 2014 November 24-28, 2014, Linkoping, Sweden milan.dojchinovski@fit.cvut.cz - @m1ci - http://dojchinovski.mk Web Intelligence Research
More informationCHAPTER 5 EXPERT LOCATOR USING CONCEPT LINKING
94 CHAPTER 5 EXPERT LOCATOR USING CONCEPT LINKING 5.1 INTRODUCTION Expert locator addresses the task of identifying the right person with the appropriate skills and knowledge. In large organizations, it
More informationNLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014
NLP Chain Giuseppe Castellucci castellucci@ing.uniroma2.it Web Mining & Retrieval a.a. 2013/2014 Outline NLP chains RevNLT Exercise NLP chain Automatic analysis of texts At different levels Token Morphological
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationKnowledge Engineering with Semantic Web Technologies
This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning
More informationSemantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.
Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...
More information<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany
Information Systems & University of Koblenz Landau, Germany Semantic Search examples: Swoogle and Watson Steffen Staad credit: Tim Finin (swoogle), Mathieu d Aquin (watson) and their groups 2009-07-17
More informationLarge-scale, Parallel Automatic Patent Annotation
Overview Large-scale, Parallel Automatic Patent Annotation Thomas Heitz & GATE Team Computer Science Dept. - NLP Group - Sheffield University Patent Information Retrieval 2008 30 October 2008 T. Heitz
More informationOwlExporter. Guide for Users and Developers. René Witte Ninus Khamis. Release 2.1 December 26, 2010
OwlExporter Guide for Users and Developers René Witte Ninus Khamis Release 2.1 December 26, 2010 Semantic Software Lab Concordia University Montréal, Canada http://www.semanticsoftware.info Contents 1
More informationQuestion Answering Using XML-Tagged Documents
Question Answering Using XML-Tagged Documents Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/trec11/index.html XML QA System P Full text processing of TREC top 20 documents Sentence
More informationUIMA-based Annotation Type System for a Text Mining Architecture
UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and
More informationAn Approach To Web Content Mining
An Approach To Web Content Mining Nita Patil, Chhaya Das, Shreya Patanakar, Kshitija Pol Department of Computer Engg. Datta Meghe College of Engineering, Airoli, Navi Mumbai Abstract-With the research
More informationDepartment of Electronic Engineering FINAL YEAR PROJECT REPORT
Department of Electronic Engineering FINAL YEAR PROJECT REPORT BEngCE-2007/08-HCS-HCS-03-BECE Natural Language Understanding for Query in Web Search 1 Student Name: Sit Wing Sum Student ID: Supervisor:
More informationToward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains
Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains Eloise Currie and Mary Parmelee SAS Institute, Cary NC About SAS: The Power to Know SAS: The Market Leader in
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More information0.1 Knowledge Organization Systems for Semantic Web
0.1 Knowledge Organization Systems for Semantic Web 0.1 Knowledge Organization Systems for Semantic Web 0.1.1 Knowledge Organization Systems Why do we need to organize knowledge? Indexing Retrieval Organization
More informationBackground and Context for CLASP. Nancy Ide, Vassar College
Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding
More informationSemantic annotation toolkit (first version)
www.kconnect.eu Semantic annotation toolkit (first version) Deliverable number D1.1 Dissemination level Public Delivery date 31 October 2015 Status Author(s) Final Ian Roberts, Fredrik Axelsson, Zoltan
More informationA cocktail approach to the VideoCLEF 09 linking task
A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,
More informationDomain-specific Concept-based Information Retrieval System
Domain-specific Concept-based Information Retrieval System L. Shen 1, Y. K. Lim 1, H. T. Loh 2 1 Design Technology Institute Ltd, National University of Singapore, Singapore 2 Department of Mechanical
More informationText Mining: A Burgeoning technology for knowledge extraction
Text Mining: A Burgeoning technology for knowledge extraction 1 Anshika Singh, 2 Dr. Udayan Ghosh 1 HCL Technologies Ltd., Noida, 2 University School of Information &Communication Technology, Dwarka, Delhi.
More informationPrecise Medication Extraction using Agile Text Mining
Precise Medication Extraction using Agile Text Mining Chaitanya Shivade *, James Cormack, David Milward * The Ohio State University, Columbus, Ohio, USA Linguamatics Ltd, Cambridge, UK shivade@cse.ohio-state.edu,
More informationA Linguistic Approach for Semantic Web Service Discovery
A Linguistic Approach for Semantic Web Service Discovery Jordy Sangers 307370js jordysangers@hotmail.com Bachelor Thesis Economics and Informatics Erasmus School of Economics Erasmus University Rotterdam
More informationLinked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library
Linked Data and cultural heritage data: an overview of the approaches from Europeana and The European Library Nuno Freire Chief data officer The European Library Pacific Neighbourhood Consortium 2014 Annual
More informationModule 9: Ontologies and Semantic Annotation
Module 9: Ontologies and Semantic Annotation The University of Sheffield, 1995-2012 This work is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike Licence About this tutorial This
More informationSyntax and Grammars 1 / 21
Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types What is a language? 2 / 21 What is a language?
More informationNew Media Analysis Using Focused Crawl and Natural Language Processing: Case of Lithuanian News Websites
New Media Analysis Using Focused Crawl and Natural Language Processing: Case of Lithuanian News Websites Tomas Krilavičius Žygimantas Medelis Jurgita Kapočiūtė-Dzikienė Tomas Žalandauskas Problem How to
More informationMaca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology
Maca a configurable tool to integrate Polish morphological data Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Outline Morphological resources for Polish Tagset and segmentation differences
More informationChapter 27 Introduction to Information Retrieval and Web Search
Chapter 27 Introduction to Information Retrieval and Web Search Copyright 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 27 Outline Information Retrieval (IR) Concepts Retrieval
More informationDeliverable D1.4 Report Describing Integration Strategies and Experiments
DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004 Report Describing
More informationA Multilingual Social Media Linguistic Corpus
A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th
More informationDeliverable D Adapted tools for the QTLaunchPad infrastructure
This document is part of the Coordination and Support Action Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad). This project has received funding from the
More informationJumpstarting the Semantic Web
Jumpstarting the Semantic Web Mark Watson. Copyright 2003, 2004 Version 0.3 January 14, 2005 This work is licensed under the Creative Commons Attribution-NoDerivs-NonCommercial License. To view a copy
More informationGetting Lost in Semantics Selecting the Right Search Engine
Getting Lost in Semantics Selecting the Right Search Engine Steve Mann VP Sales Concept Searching stevem@conceptsearching.com Robert Piddocke VP Channel and Business Development Concept Searching mikep@conceptsearching.com
More informationOverview of Web Mining Techniques and its Application towards Web
Overview of Web Mining Techniques and its Application towards Web *Prof.Pooja Mehta Abstract The World Wide Web (WWW) acts as an interactive and popular way to transfer information. Due to the enormous
More informationData for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit
Data for linguistics ALEXIS DIMITRIADIS Text, corpora, and data in the wild 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study:
More informationTIC: A Topic-based Intelligent Crawler
2011 International Conference on Information and Intelligent Computing IPCSIT vol.18 (2011) (2011) IACSIT Press, Singapore TIC: A Topic-based Intelligent Crawler Hossein Shahsavand Baghdadi and Bali Ranaivo-Malançon
More information