from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington

Similar documents
ELAN Linguistic Annotator

A Short Introduction to CATMA

One of the fundamental kinds of websites that SharePoint 2010 allows

VERSION 1.0, FEATURE PACK What s New SAP Enable Now

Creating Word Outlines from Compendium on a Mac

Module 1: Information Extraction

Blackboard Learn 9.1 Reference Terminology elearning Blackboard Learn 9.1 for Faculty

RIT Wiki 5.1 Upgrade - May 21, 2013

Annotation by category - ELAN and ISO DCR

Standard Professional Premium

Blackboard Portfolio System Owner and Designer Reference

EUDICO Linguistic Annotator (ELAN) from Max Planck Institute for Psycholinguistics

Pharos Designer 2. Copyright Pharos Architectural Controls (15/1/2015)

HOPE Project AAL Smart Home for Elderly People

California Open Online Library for Education & Accessibility

Useful Google Apps for Teaching and Learning

1704 SP2 CUSTOMER. What s New SAP Enable Now

Blog Pro for Magento 2 User Guide

Specification Manager

Managing references & bibliographies using Mendeley

ITaP Confluence Guide. Instructions for Getting Started with Confluence a Purdue

Loquendo TTS Director: The next generation prompt-authoring suite for creating, editing and checking prompts

Anchovy User Guide. Copyright Maxprograms

A personal research assistant. Inside your browser.

InSite Prepress Portal Quick Start Guide IPP 9.0

User s guide to using the ForeTees TinyMCE online editor. Getting started with TinyMCE and basic things you need to know!

User Guide. Copyright Wordfast, LLC All rights reserved.

National Training and Education Resource. Authoring Course. Participant Guide

Telerik Training for Mercury 3

KS Blogs Tutorial Wikipedia definition of a blog : Some KS Blog definitions: Recommendation:

Practice Labs User Guide

Exploring SharePoint Designer

Innovasys HelpStudio 3 Product Data Sheet

EECS 1710 SETTING UP A VIRTUAL MACHINE (for EECS labs)

Blackboard Collaborate Ultra

GSLIS Technology Orientation Requirement (TOR)

1. Download and install the Firefox Web browser if needed. 2. Open Firefox, go to zotero.org and click the big red Download button.

GSLIS Technology Orientation Requirement (TOR)

Space Details. Available Pages. Confluence Help Description: Last Modifier (Mod. Date): ljparkhi (Aug 14, 2008)

ClickLearn Studio. - A technical guide 9/18/2017

Creating Accessible PDFs

Agilix Buzz Accessibility Statement ( )

Lesson 2 page 1. ipad # 17 Font Size for Notepad (and other apps) Task: Program your default text to be smaller or larger for Notepad

The diverse software in the Adobe Creative Suite enables you to create

Telerik Training for Mercury 3

Zend Studio 3.0. Quick Start Guide

Form into function. Getting prepared. Tutorial. Paul Jasper

COPYRIGHTED MATERIAL. Part I: Getting Started. Chapter 1: Introducing Flex 2.0. Chapter 2: Introducing Flex Builder 2.0. Chapter 3: Flex 2.

User Guide. Copyright Wordfast, LLC All rights reserved.

Web Accessibility Checklist

edublogs ~ a WordPress Blog

Intro to XML. Borrowed, with author s permission, from:

The Related Items window accelerates review by providing a single place to view everything related to a document.

Space Details. Available Pages

HKUST WEBSITE GUIDELINES LAST UPDATED _ AUGUST 2018

IJSRD - International Journal for Scientific Research & Development Vol. 3, Issue 02, 2015 ISSN (online):

Web Site Documentation Eugene School District 4J

User Guide. Copyright Wordfast, LLC All rights reserved.

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Advanced Topics in Curricular Accessibility: Strategies for Math and Science Accessibility

Create-A-Page Design Documentation

Tennessee. Business Technology Course Code Web Design Essentials. HTML Essentials, Second Edition 2010

Specification Manager

How to.. What is the point of it?

Blackboard staff how to guide Accessible Course Design

Chapter 2. Architecture of a Search Engine

Getting Started with Zoom

Website Development Komodo Editor and HTML Intro

Enterprise Architect. User Guide Series. Ribbons. Author: Sparx Systems Date: 27/05/2016 Version: 1.0 CREATED WITH

A Guided Tour of Doc-To-Help

Installation Guide Web Browser Method

Wikispaces in Education A Comprehensive Tutorial

Vim: A great tool for your toolbox! 1/13

Enterprise Architect. User Guide Series. Ribbons. Author: Sparx Systems Date: 15/07/2016 Version: 1.0 CREATED WITH

Educational Fusion. Implementing a Production Quality User Interface With JFC

Browsing the Semantic Web

National Body Priority MI Topic MI Theme NB Comment. NB_Lookup

Online Accessibility Guidelines

Introduction to HTML

Typhon Group Website WCAG 2.0 Support Statement

The software for this server was created by Floris van Vugt (programmer) and Alexis Dimitriadis, for the Berlin-Utrecht Reciprocals Survey.

ELAR: instructions for depositors

SAP Enable Now What s New. WHAT S NEW PUBLIC Version 1.0, Feature Pack SAP Enable Now What s New. Introduction PUBLIC 1

Below, we will walk through the three main elements of the algorithm, which include Domain Attributes, On-Page and Off-Page factors.

Concordance Basics. Part I

BOLT eportfolio Student Guide

California Open Online Library for Education & Accessibility

12/3/ Introduction to CenterStage Spaces and roles. My Community My Spaces, My Favorite Spaces

California Open Online Library for Education & Accessibility

V12 Highlights. What s new in Richmond ServiceDesk V12?

3D PDF Plug-ins for Autodesk products Version 2.0

Click the buttons in the interactive below to become familiar with the drawing tool's commands.

eportfolio Support Guide

mw:translate: User workflow design

Introduction to Topologi Markup Editor , 2005 Topologi Pty. Ltd.

Selecting An XML File. Boris Job Slate

California Open Online Library for Education & Accessibility

Week - 01 Lecture - 04 Downloading and installing Python

Kaltura Video Package for Moodle 2.x Quick Start Guide. Version: 3.1 for Moodle

Optimized design of customized KML files

Transcription:

Vol. 4 (2010), pp. 60-65 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4467 TypeCraft from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington 1. OVERVIEW. The following review of TypeCraft is based on experience with the tool as of August, 2009. TypeCraft is an on-line database used to create, share, search, and present linguistically annotated texts (i.e., interlinear glossed texts). Users are able to create their own texts either by uploading files or by typing in texts by hand. TypeCraft aids the user in annotation by providing a visual framework for tokenization and by presenting hundreds of common linguistic categories. The data are sharable, since the tool is meant to encourage collaborative efforts: several people can edit a database at the same time. TypeCraft provides a rich search facility over the texts in the database, allowing users to search by linguistic level (phrase, word, morpheme) or by annotation element (gloss, feature, etc.). Finally, the tool can be used to present data in a variety of handy formats, including HTML, LaTeX, XML, and wiki markup. 2. OVERALL DESIGN. TypeCraft is designed in Java using current Web technologies, including a MediaWiki interface and a PostgreSQL database backend. The wiki interface provides a familiar, clean look and feel to the user experience. For instance, there is no need to have several programs open at once (editor, browser, terminal) while editing an annotated text. The wiki is easy to use, as the MediaWiki software (http://www.mediawiki. org) is quite familiar to many Web users (cf. http://wikipedia.org). Currently, TypeCraft runs only under the Firefox 3 browser. 2.1 ANNOTATION PROCESS. To begin the annotation process, the user is presented with a Text editor frame as shown in figure 1. The interface is similar to an on-line word processing application (such as Google docs). The user first chooses a language to assign to the text, and this is accomplished by using an auto-complete box. The list of languages is generated from Ethnologue (Lewis 2009; http://www.ethnologue.com). Once a language name is chosen, a link to the specific Ethnologue page for that language is created. For instance, if the language name Spanish is chosen, a link is created to: http://www.ethnologue.com/show_language.asp?code=spa The user is expected to use a language name, and not a language code, to search for the language entry. If the language name does not exist, then TypeCraft accepts the entry but throws an error when the user first tries to save the text. Thus, either the language has to be in Ethnologue or the string undetermined is used. Furthermore, the user can only use the primary language name (presumably generated from the open tables of Ethnologue). The next step is to enter a text into the provided text box. Unicode is supported, though the user must enter the Unicode symbols into the text areas directly by cut and paste: there Licensed under Creative Commons Attribution Non-Commercial Share Alike License E-ISSN 1934-5275

Review of TypeCraft 61 is no soft keyboard built into TypeCraft. As a working example, consider the following Lule Sami text from TypeCraft s database: (1) Gulluvasjvuohta ietjama duobddágijda sisñemusán vuona sinna la nanos. Dáppe máhtáv sámevuodanam viessot. The user highlights each phrase to annotate, and TypeCraft builds a list of phrases, each of which shows up as a hyperlink in an adjacent area. If the user clicks on a link, Type- Craft then presents the text for annotation. The user can then begin to segment and insert delimiters (hyphens or spaces) between stems and affixes. Once this is done, TypeCraft builds a matrix for further annotation. As shown in figure 1, the matrix contains five rows: Latinised, Morpheme, Meaning, Gloss, and POS. Fi g u r e 1: The matrix as built automatically from a text. It is the task of the user, then, to tab through the matrix and enter various elements of annotation. For instance, when the user clicks on the POS (part of speech) box and begins to type, an auto-complete function is called which allows the user to search for the appropriate gloss element. The drop-down auto-complete box is shown in figure 2 for N (noun). TypeCraft also uses a Lazy Annotation Mode, where glosses that have already been used are retrieved from the database and displayed to the user on subsequent annotations.

Review of TypeCraft 62 2.2 USE OF TERMINOLOGY. The terminology is stored internally within TypeCraft and cannot be customized. For instance, there are 66 POS category labels (e.g., PRON pronoun, REL relative, DET determiner) and 214 glossing tags (e.g., HUM human Animacy, ADD additive Aspect). Fi g u r e 2: The auto-complete menu for POS. These are the same categories used in TypeCraft s search facility (discussed below). 2.3 COLLABORATIVE ASPECTS. As for collaboration, the design allows for the single user (possibly with several different work spaces) as well as for groups of users. The latter adds an element of privacy to data collaboration, either because the project is not yet finished, or because of the cultural sensitivity of the material. Every time a text is created, the option of sharing with a particular group is presented. TypeCraft is well-suited for collaboration of native speakers with fieldworkers. The interface is fairly straightforward. In one scenario, the native speaker could enter the text and translation, while the linguist could add the annotation at a later stage. In another scenario, a group of linguists could use the tool, each responsible for annotating a particular aspect of the same text. 3. SEARCH FACILITY. A major appeal of the TypeCraft system is the ability to search both across whole texts (by title and language) and across any element of the text (phrases, words, or morphemes). That is, it is possible to search the TypeCraft database using the familiar exact word or exact phrase match method. In addition, one can search using exact morpheme and according to the elements of annotation. The search-on-annotation

Review of TypeCraft 63 feature provides for very detailed searches. For instance, one could search using the following combination of features: phrases containing light verbs containing activity verbs topicalized information That is, search for phrases containing light verb and activity verbs where information is topicalized. Categories for phrase-level entities are auto-generated based on two different tagsets, currently default and label conventions. The following is an example of the combinable features (using AND or OR) at the morpheme level: active 3SG AUX That is, search for auxiliaries in combination with active voice and third person, singular number. The same tagsets (for POS and for glossing elements) are used in the annotation editor. 4. INTEGRATION WITH OTHER TOOLS. TypeCraft uses a PostgreSQL database backend. All information entered into the system is stored remotely, an example of cloud computing. Cloud computing refers to Web-based applications such that both data and application reside not on the user s machine, but in the cloud, i.e., on a remote server or group of servers. Cloud computing frees the user from having to worry about data loss or software upgrades. Of course, the limits of cloud computing are not inconsiderable, especially for fieldworkers who may not have Internet access. As a consequence of all user data being stored remotely in a database, data projects are potentially affected by global database changes. For instance, if the database maintainers were to change an annotation tag from TNS to TENSE, then the change would show up in all records. All data from the TypeCraft database may be exported in a variety of formats, including HTML, LaTeX, or XML. Furthermore, to encourage the sharing of data on wikis, texts can be exported in wiki format. Both the HTML and LaTeX formats are based on a simple tabular presentation of the data, while the XML output conforms to an XML schema containing nested elements (Phrase, Word, Morpheme, POS). And finally, since TypeCraft is designed as part of a wiki system, media including sound files and images are easily incorporated into the user s site. See, for example, this page on Èdó: http://www.typecraft. org/tc2wiki/user:ota. It is currently not possible to upload Toolbox files into the TypeCraft database. 5. WEBSITE AND DOCUMENTATION. The TypeCraft site (http://www.typecraft.org) is organized using MediaWiki software. The site actually contains the application, its documentation, and the community portal, plus the wiki. As such, the wiki is a kind of one-stop interface for all of TypeCraft s functionality. There is a step-by-step tutorial that walks a user through a typical annotation session. The details of the tutorial are adequate and not overly detailed, as it is presented as a quick start. To be considered as part of the documentation are the tables of linguistic categories (POS and glosses). For a given element of

Review of TypeCraft 64 annotation, the tables list the name, abbreviation, and class to which the element belongs. Finally, there is a presentation of how TypeCraft data can be embedded into a wiki system, something that is quite useful considering the nature of TypeCraft itself. 6. COMPARISON WITH OTHER TOOLS. Another tool with similar functionality is the Field Linguist s Toolbox (available at http://www.sil.org/computing/toolbox). To compare TypeCraft with Toolbox, a number of dimensions can be considered, including platform, basic functionality, and interlinear text creation/annotation. The obvious difference of course is that while TypeCraft lives in the cloud, Toolbox is a stand-alone desktop application to be downloaded and installed locally. As a result, TypeCraft is arguably better suited in terms of platform, as it can run on any machine with Firefox, including Windows, Mac, and most varieties of Linux. Toolbox runs only on Windows and Mac (with questionable runability on Linux systems through a Linux emulator on Windows/Mac). In terms of basic functionality, Toolbox is the more all-in-one tool for the linguist who needs the ability to manage entire field projects involving lexicons and interlinear texts. TypeCraft is not intended for creating lexicons (though the developers plan to include such functionality in the future). Thus, the main comparison to be made between these tools concerns the creation and annotation of texts. Toolbox, like TypeCraft, offers the user a semi-automated means to segment the text. With Toolbox the user can use a lexicon that has already been compiled to aid in the segmentation. In TypeCraft, the user performs this task manually by inserting, for example, hyphens between morphemes. Unicode poses an issue for Toolbox users, because success is based on the user s local system and whether certain fonts are installed. With TypeCraft, Unicode support depends on the Firefox browser, a situation that is usually less problematic than a dependence on local fonts. Both applications allow for export to different file formats, such as XML. TypeCraft seems to offer an easier path to HTML and PDF (via LaTeX). In terms of glossing tags, TypeCraft comes with a predefined set, while Toolbox is largely free-form. This is perhaps not a meaningful comparison to the average singleproject user, but it is a key aspect of TypeCraft that allows for a community of practice to form. That is, all data that are annotated using TypeCraft are searchable using a common interface. Linguists can see how others have marked up similar data, along with where and how particular tags are used. Arguably, the same could be said of Toolbox if a large enough database of Toolbox files were publicly available, something that does not currently seem to be the case. 7. CONCLUSIONS. TypeCraft, then, is a very promising on-line tool for the collaborative annotation of interlinear glossed texts. It allows for data creation, data sharing, and data presentation. In addition, TypeCraft provides a rich search interface for the data contained within the on-line database. Because it is implemented in the cloud, there is no need to install software (other than Firefox). And even for beginning students or for native speakers without linguistics expertise, the tool can be a useful way to create data. It has a one-stop interface implemented via the easy-to-learn Media Wiki software. The availability of hundreds of pre-defined linguistic categories is impressive. This tool has some aspects that could be improved. The first is the inability to run on browsers other than Firefox (though it should be said that this situation is considerably

Review of TypeCraft 65 better than with a tool that requires proprietary software or hard-to-install plug-ins). The search facility could be more user-friendly, in terms of both the slightly out-dated, pulldown-style menus and the allowable searches. Currently, any combination of features is allowed, and the choices are just too open ended. This could be viewed as a feature of the system, though for the average user, it is a little daunting. Though TypeCraft has an impressive inventory of linguistic categories, a more detailed explanation of each would be a convenient feature to add in the next version. Since TypeCraft is wiki-based, the explanations could easily be linked to a community discussion. Primary function: Pros: Cons: Platforms: Firefox 3 Open Source: Proprietary: Available from: Cost: Linguistic annotation of interlinear texts; search over a database of texts; collaboration Collaborative computing in the cloud; one-stop interface for annotation and search; availability of many linguistic categories; several export formats (HTML, LaTeX, XML, wiki markup) Works only with Firefox; categories need explanation No Yes (but open for all to use) http://www.typecraft.org Free Reviewed Version: August 13, 2009 Application Size: Documentation: Unknown http://www.typecraft.org/tc2wiki/help:contents Re f e r e n c e s Lewis, M. Paul (ed.). 2009. Ethnologue: Languages of the world. 16th edn. Dallas, TX: SIL International. Online version: http://www.ethnologue.com. Scott Farrar farrar@uw.edu