WebAnno: a flexible, web-based annotation tool for CLARIN

Similar documents
Annota&ng corpora with WebAnno

A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures

Final Project Discussion. Adam Meyers Montclair State University

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation

UIMA-based Annotation Type System for a Text Mining Architecture

Implementing a Variety of Linguistic Annotations

Textual Emigration Analysis

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio.

WebLicht: Web-based LRT Services in a Distributed escience Infrastructure

Apache UIMA and Mayo ctakes

Ortolang Tools : MarsaTag

Managing a Multilingual Treebank Project

An UIMA based Tool Suite for Semantic Text Processing

The Multilingual Language Library

NLP in practice, an example: Semantic Role Labeling

WASA: A Web Application for Sequence Annotation

KAF: a generic semantic annotation format

Find Problems before They Find You with AnnotatorPro s Monitoring Functionalities

Formats and standards for metadata, coding and tagging. Paul Meurer

MDSWriter: Annotation tool for creating high-quality. multi-document summarization corpora.

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Tools for Annotating and Searching Corpora Practical Session 1: Annotating

Wikulu: Information Management in Wikis Enhanced by Language Technologies

Representation and Interchange of Linguistic Annotation An In-Depth, Side-by-Side Comparison of Three Designs

MADARi: A Web Interface for Joint Arabic Morphological Annotation and Spelling Correction

SAWT: Sequence Annotation Web Tool

Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF

ALeF: Active Learning Framework for Readability Prediction

Bachelor Thesis. Evaluation of Argumentation Schemas for the. Identification of Software-Architecture. Knowledge in Developer Communities

Migrating LINA Laboratory to Apache UIMA

Customisable Curation Workflows in Argo

Reducing Human Effort in Named Entity Corpus Construction Based on Ensemble Learning and Annotation Categorization

D-SPIN Joint Deliverables R5.3: Documentation of the Web Services R5.4: Final Report

A Corpus Representation Format for Linguistic Web Services: the D-SPIN Text Corpus Format and its Relationship with ISO Standards

VANNOTATOR: a Gesture-driven Annotation Framework for Linguistic and Multimodal Annotation

Unstructured Information Management Architecture (UIMA) Graham Wilcock University of Helsinki

An Adaptive Framework for Named Entity Combination

Getting Started with DKPro Agreement

Deliverable D Adapted tools for the QTLaunchPad infrastructure

ANC2Go: A Web Application for Customized Corpus Creation

A tool for Cross-Language Pair Annotations: CLPA

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

XML Support for Annotated Language Resources

CRFVoter: Chemical Entity Mention, Gene and Protein Related Object recognition using a conglomerate of CRF based tools

A Short Introduction to CATMA

Converting and Representing Social Media Corpora into TEI: Schema and Best Practices from CLARIN-D

Using Relations for Identification and Normalization of Disorders: Team CLEAR in the ShARe/CLEF 2013 ehealth Evaluation Lab

CSC 5930/9010: Text Mining GATE Developer Overview

Why Was Arbil Written

A Multilingual Social Media Linguistic Corpus

HyLaP-AM Semantic Search in Scientific Documents

Introduction to IE and ANNIE

clarin:el an infrastructure for documenting, sharing and processing language data

The American National Corpus First Release

A CASE STUDY: Structure learning for Part-of-Speech Tagging. Danilo Croce WMR 2011/2012

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

A Flexible Distributed Architecture for Natural Language Analyzers

TEXTPRO-AL: An Active Learning Platform for Flexible and Efficient Production of Training Data for NLP Tasks

Building trainable taggers in a web-based, UIMA-supported NLP workbench

Package corenlp. June 3, 2015

Module 4: Teamware: A Collaborative, Web-based Annotation Environment

The Goal of this Document. Where to Start?

NLP Chain. Giuseppe Castellucci Web Mining & Retrieval a.a. 2013/2014

Precise Medication Extraction using Agile Text Mining

Fast and Effective System for Name Entity Recognition on Big Data

NLP Final Project Fall 2015, Due Friday, December 18

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS

Learning to Rank Aggregated Answers for Crossword Puzzles

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Background and Context for CLASP. Nancy Ide, Vassar College

Annotating Spatio-Temporal Information in Documents

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Entity-Centric Information Access with the Human-in-the-Loop for the Biomedical Domains

Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation

Text Corpus Format (Version 0.4) Helmut Schmid SfS Tübingen / IMS StuDgart

Deliverable D1.4 Report Describing Integration Strategies and Experiments

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

A Unified Text Annotation Workflow for Diverse Goals

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Korean NLP2RDF Resources

Homework 2: Parsing and Machine Learning

Voice. Voice. Patterson EagleSoft Overview Voice 629

Building Search Applications

EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates

Preservation Planning in the OAIS Model

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Text Mining for Software Engineering

Machine Learning in GATE

Platform UI Specification (26)

Common Language Resources and Technology Infrastructure REVISED WEBSITE

Teamware: A Collaborative, Web-based Annotation Environment. Kalina Bontcheva, Milan Agatonovic University of Sheffield

Stanbol Enhancer. Use Custom Vocabularies with the. Rupert Westenthaler, Salzburg Research, Austria. 07.

Disclaimer This presentation may contain product features that are currently under development. This overview of new technology represents no commitme

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight

Module 1: Information Extraction

Janne Bondi johannessen, Anders Nøklestad, Joel Priestley and Kristin Hagen. WP5: Glossa Integration

Transcription:

WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International. If you are interested in using this material under different conditions, please contact us.

WebAnno an annotation tool for text! Team tool! Allows a distributed team of annotators to work on a corpus! Supports different roles within the team (e.g. user / manager)! Flexible! Multi-layer annotation with configurable annotation layers! Different annotation modes including correction and learning modes! Web-based! Available to annotators everywhere, no installation effort! All configuration performed through the web interface! Platform independent! Platform independent Java-based application! Open source! Allows the community to participate 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 2

WebAnno an annotation tool for CLARIN! Developed based on the requirements of CLARIN F-AG 7! Dipper et al. NoSta-D: A corpus of German non-standard varieties. Non-Standard Data Sources in Corpus-Based Research (2013): 69-76.! Benikova et al. NoSta-D Named Entity Annotation for German: Guidelines and Dataset. Proceedings of LREC. 2014.! but also used beyond F-AG 7! Pedersen et al. Semantic Annotation of the Danish CLARIN Reference Corpus. Proceedings 10th Joint ISO-ACL SIGSEM Workshop on Interoperable Sem. Annotation. 2014.! used and recognized beyond CLARIN! Search WebAnno on Google Scholar! See our public users mailing list! WebAnno is the first annotation tool to supporting WebLicht TCF! Worked with TCF developers to improve TCF support updating files!! WebAnno team is constantly in touch with the community! Visit http://webanno.googlecode.com after the talk to participate in our survey! 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 3

Annotation examples Part-of-Speech & syntactic dependencies Named entities Co-reference 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 4

Main Menu! Annotate texts from scratch! Review and correct previously annotated documents! Employ integrated machine learning capabilities! Compare annotations from different annotators and merge them! Assign workload to annotators and monitor their progress! Create new projects! Create new user accounts 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 5

Workflow of a WebAnno project! d EXPORT FINAL DATASET 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 6 t

Curation curator s editor display annotators color-coded agreement highlight sentences with disagreement 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 7

Built-in layers vs. custom layers! WebAnno offers various built-in annotation layers! User can immediately start annotating! Only linguistic layers! Layer semantics are known! Custom layers allow WebAnno to be adapted to unforeseen tasks! Adapt to non-linguistic annotation tasks! Adapt to unforeseen linguistic annotation tasks! Layer semantics are unknown! Import/export of annotated data! Layers with known semantics convert from/to many formats (TCF, CoNLL, )! Layers with unknown semantics convert from/to generic formats (XMI, ) 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 8

Layer types! Existing built-in layers were generalized into three layer types! Span layer POS, lemma, named entity,! Relation layer Syntactic dependencies,! Attaches to span annotations! Directed, reversible arcs! Chain layer Co-reference chains,! Undirected arcs! Layers can be further customized using behaviours! Character-based or token-based! Single/multiple token! Crossing of sentence boundaries! Stacking 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 9

Custom layer examples Semantic predicates and arguments (span/relation) Person (span) / Relationship (relation) 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 10

Custom layer examples Semantic predicates and arguments (span/relation) Person (span) / Relationship (relation) 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 11

Custom layer configuration Features Layers Control behavior Controlled vocabulary 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 12

Integrated machine-learning! Annotating data from scratch is more work than correcting! WebAnno learns from pre-annotated data and makes suggestions! Accept suggestions with a single click! Correct suggestions to improve training data 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 13

Example: Chunking Externally preannotated secondary data Part-of-Speech training data POS-tagger model POS tagged text Data annotated in WebAnno Chunkannotated documents Externally preannotated primary data Chunk training data Chunker model Chunk suggestions 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 14

Automation configuration Secondary training data Primary training data Training data example 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 15

Deploy WebAnno as you need it click to start webanno-standalone.jar personal workstation migrate projects on-premise group server to come cloud-based group server CLARIN infrastructure service 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 16

Where we want to go from here! Extend the scope of WebAnno! Support for slot-based annotation layers (semantic annotations)! Tagset constraints! Support for more built-in linguistic layers! Improve continuously based on user feedback! More efficient annotation interface! Support for additional corpus formats! your feedback?! Deploy as a CLARIN infrastructure service! CLARIN AAI support! Reduce administrative overhead for operators! Self-service for project managers 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 17

#WebAnno Visit me in the demo session! http://webanno.googlecode.com 24.10.2014 Computer Science Department UKP Lab Richard Eckart de Castilho 18