A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures

Size: px
Start display at page:

Download "A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures"

Transcription

1 A Web-based Tool for the Integrated Annotation of Semantic and Syntactic Structures Richard Eckart de Castilho and Éva Mújdricza-Maydt and Seid Muhie Yimam and Silvana Hartmann and Iryna Gurevych and Anette Frank and Chris Biemann Ubiquitous Knowledge Processing Lab Department of Computer Science Technische Universität Darmstadt FG Language Technology Department of Computer Science Technische Universität Darmstadt Department of Computational Linguistics Heidelberg University Abstract Research Training Group AIPHES Heidelberg University and Technische Universität Darmstadt We introduce the third major release of WebAnno, a generic web-based annotation tool for distributed teams. New features in this release focus on semantic annotation tasks (e.g. semantic role labelling or event annotation) and allow the tight integration of semantic annotations with syntactic annotations. In particular, we introduce the concept of slot features, a novel constraint mechanism that allows modelling the interaction between semantic and syntactic annotations, as well as a new annotation user interface. The new features were developed and used in an annotation project for semantic roles on German texts. The paper briefly introduces this project and reports on experiences performing annotations with the new tool. On a comparative evaluation, our tool reaches significant speedups over WebAnno 2 for a semantic annotation task. 1 Introduction As natural language processing pushes towards natural language understanding, i.e. the ability for a machine to process the meaning of language rather than just its structure, there is a growing need for corpora analysed on the semantic level. Semantic structures pose different requirements for annotation tools than morphological or syntactic annotation. For example, the rich sets of semantic categories used in tasks such as semantic role labelling (SRL) or event annotation require special support to avoid the choice becoming a burden to the annotator. Also, due to the interaction of semantics with other levels of linguistic analysis, particularly syntax, it is desirable to work with generic annotation tools that simultaneously support multiple levels of annotation. Generic web-based annotation tools do not sufficiently support the annotation of semantic structures with their rich tagsets. Also, specialised annotation tools, in particular for SRL, are not flexibly adaptable to other annotation schemes, and many of them are technologically outdated. WebAnno 3 1 is the third major release of the web-based annotation tool WebAnno (Yimam et al., 2013; Yimam et al., 2014) introducing new functionalities enabling the annotation of semantic structures: 1. Slot-features allow the appropriate modelling of predicate-argument structures for SRL. We also support the following additional semantic annotation types: participants and circumstances for event annotation, n-ary relations for relation extraction, and slot-filling tasks for information extraction. 2. Constraints help annotators by performing a context-sensitive filtering of the rich semantic tagsets. For example, the sense of a semantic predicate determines available argument roles. Such a filtering is necessary to avoid loosing valuable time by having annotators search through a large number of tags or to manually type in tags. Constraint rules can be defined manually or they can be automatically This work is licensed under a Creative Commons Attribution 4.0 International Licence. License details: org/licenses/by/4.0/ 1

2 generated, e.g. from machine-readable lexical resources. To our knowledge, there is no other web-based annotation tool offering a comparable functionality. 3. An improved annotation interface for a streamlined annotation process using a permanently visible sidebar instead of a pop-up dialog for editing annotations and their features. These new functionalities integrate well with the existing functionalities in WebAnno 2, in particular its support for the annotation of syntactic structures, thus enabling semantic annotation in coordination with syntactic annotation. To our knowledge, WebAnno 3 is presently the only web-based and team-oriented annotation tool to support both, the annotation of semantic as well as syntactic structures. This includes, but is not limited to the tools mentioned in Section 2. WebAnno 3 was developed and implemented in close coordination with users in the context of an annotation project (cf. Mújdricza-Maydt et al. (2016)) for word sense disambiguation (WSD) and SRL on German texts and driven by its practical requirements. SRL is the task of identifying semantic predicates, their arguments, and assigning roles to these arguments. It is a difficult task usually performed by experts. Examples of well-known SRL schemes motivated by different linguistic theories are FrameNet (Baker et al., 1998), PropBank (Palmer et al., 2005), and VerbNet (Kipper Schuler, 2005). SRL annotation is typically based on syntactic structures obtained from treebanks, such as the constituent-based Penn Treebank (for PropBank annotation), or the German TIGER treebank for FrameNet-style annotation (Burchardt et al., 2009). An argument is typically identified by the span of its syntactic head or syntactic constituent. For some annotation schemes (e.g. FrameNet), the task also includes WSD. In this case, the sense label typically determines the available argument slots. The example below shows an annotation using FrameNet; the predicate ask receives the frame label Questioning (corresponding to its word sense) and its arguments are annotated as Addressee, Speaker, Message, and Iterations: (1) Fred Role:Addressee didn t answer. (2) While I Role:Speaker had asked Frame:Questioning this question Role:Message twice Role:Iterations before. Note that there are multiple interdependencies between annotations involved here. E.g. the available sense labels for the semantic predicate depend on its lemma. Also, the available argument roles depend on the sense label. We further describe the annotation project and the application of WebAnno 3 within the project in Section 4. While joint WSD and SRL annotations are conveniently supported and facilitated using constraints, they can also be performed separately and independently of one another. 2 Related Work We briefly review presently available annotation tools that could be used for annotating semantic structures. Since we aim to support geographically distributed annotation teams, we consider recent generic webbased annotation tools. Additionally, we examine tools from earlier semantic annotation projects that are specialised for SRL but not web-based. 2.1 Web-based Annotation Tools Anafora by Chen and Styler (2013) is a recent web-based annotation tool for event-like structures. Specifically, it supports the annotation of spans and n-ary relations. Spans are anchored on text while relations exist independently from the text and consist of slots that can be filled with spans. Annotations are visualised using a coloured text background. Selecting a relation highlights the participating spans by placing boxes around them. Anafora is not suited for annotation tasks that require an alignment of the semantic structures with syntactic structures such as constituent or dependency parse trees. brat by Stenetorp et al. (2012) is another web-based annotation tool with a focus on collaborative annotation. The tool supports spans and n-ary relations (also called events). Annotations are visualised as boxes and arcs above the text. Multiple annotators can simultaneously work on the same annotations instead of being isolated from each other. However, this removes the ability to calculate inter-annotator agreement. All annotation actions are performed through a pop-up dialog, which necessitates many actions even for simple annotations. While in principle the support for semantic annotation in brat through

3 the n-ary relations is good, there is no support for guiding the user through rich semantic tagsets, e.g. by showing only applicable tags based on the annotation s context. WebAnno 2 (Yimam et al., 2014), for which we present an updated version here, is a generic and flexible annotation tool for distributed teams. To visualise the text and annotations, it uses the JavaScriptbased annotation visualisation from brat. The annotation of semantic structures can only be realised with great difficulty in WebAnno 2. It requires the creation of a custom span layer to represent semantic predicates and arguments. An additional custom relation layer is required to connect the arguments to the predicate. The distinction between predicates and arguments needs to be made through a feature on the span layer (e.g. semantictype=pred arg), because the use of two distinct span layers for predicates and their arguments is not possible. The reason is that WebAnno 2 only allows relations to be created between annotations on the same layer. With such a setup, annotators have to take great care not to create invalid semantic structures, e.g. by linking predicates to other predicates or arguments to other arguments while the annotation guidelines only allow links between predicates and arguments. This complicates and slows down the annotation process and requires post-hoc consistency checks. Finally, WebAnno 2 offers no provisions to guide annotators through rich semantic tagsets. Due to the complex setup, annotating semantic structures also requires too many user interactions (clicks), making it also a very tedious task. This problem is aggravated by an annotation dialog popping up for each action. 2.2 Semantic Role Annotation Tools SALTO by Burchardt et al. (2006) supports the annotation of constituency treebanks with FrameNet categories. Once a frame for a predicate has been selected, applicable roles from FrameNet can be assigned to nodes in the parse tree by drag-and-drop. SALTO supports discontinuous annotations, multi-token annotations, and cross-sentence annotations. It also offers basic team management functionalities including workload assignment and curation. However, annotators cannot correct mistakes in the underlying treebank because the parse tree is not editable. This is problematic for automatically preprocessed input. The final release of SALTO was in Jubilee and Cornerstone by Choi et al. (2010) are tools for annotating PropBank. Jubilee supports the annotation of PropBank instances, while its sister tool Cornerstone allows editing the frameset XML files that provide the annotation scheme to Jubilee. The user interface (UI) of Jubilee displays a treebank view and allows annotating nodes in the parse tree with frameset senses and roles. Jubilee supports annotation and adjudication of annotations in small teams. It is a Java application that stores all data on the file system. Thus, the annotation team needs to be able to access a common file system, which does not meet our needs for a distributed team. Both tools appear to be no longer being developed since Requirements of Semantic Annotation The annotation of semantic structures imposes two main requirements on annotation tools: 1) the flexibility to support multiple layers of annotation including syntactic and semantic layers using freely configurable annotation schemes and 2) the ability to handle large, interdependent tagsets. Flexible multi-layer annotation. While the usage-driven design of dedicated SRL annotation tools allows for a very efficient annotation, users face a serious lack of flexibility when trying to combine different annotation schemes (e.g. GermaNet senses (Hamp and Feldweg, 1997) and VerbNet roles), or when trying to use data preprocessed in different ways (e.g. for a crowdsourcing approach, automatically pre-annotating predicate and argument spans can be helpful, while experts may find pre-annotated dependency relations beneficial). This is not supported by current web-based annotation tools. Handling rich annotation schemes. Tools need to specifically support rich semantic annotation schemes like FrameNet with interdependent labels (i.e. sense labels determine available argument roles). Manually typing sense and role labels is error-prone, and selecting them from a long list is cumbersome for the annotator. The tool must guide the annotator, e.g. by filtering or reordering possible sense labels based on the lemma and do the same for role labels based on the selected sense. E.g. SALTO only offers frames in the spectrum of the lemma under annotation.

4 3 Semantic Annotation using WebAnno 3 To meet the requirements described above, WebAnno 3 introduces support for arbitrary semantic role labelling schemes by geographically distributed teams. The great challenge in building WebAnno 3 was to identify a way of implementing support for semantic annotation not only without adversely affecting existing features of WebAnno 2, but also by capitalising on the existing features. The data model for annotations used by WebAnno is based on feature structures, i.e. typed key-value pairs. First, our tool adds a new type of features for use in these structures, namely slot features, that enable SRL and other slot-filling annotation tasks. Next, we introduce constraint rules to allow the modelling of tag interdependencies. Constraint rules can be used not only to provide context-sensitive assistance for semantic annotations, but in general for all kinds of annotations supported by WebAnno 3, including those inherited from WebAnno 2. Finally, an improved and streamlined user interface was introduced to enable annotation to be performed without disruptive pop-up dialogs. 3.1 Semantic Annotation using Slot Features WebAnno provides a flexible configuration system for annotation layers supporting spans and relations. However, as outlined in Section 2.1, the provided features are insufficient for semantic annotation. WebAnno 3 introduces a new feature type into the feature-structure-based system the so-called slot features to support semantic annotations like SRL. Slot features can be added to span layers and allow one span annotation (slot owner) to link to one or more other span annotations (slot fillers). Each link is qualified with a role, which may optionally be restricted by a tagset. Slot features can be configured to allow arbitrary slot fillers or only such pertaining to a specific layer. Figure 1 illustrates the annotation of word sense and semantic roles for ask as introduced in Example (1). Declaration. We model the semantic predicate using a span layer SemPred with a slot feature called arguments. A second span layer SemArg represents the slot fillers. 2 Interaction. To use the slot feature, the user first creates the slot-owning annotation, here a SemPred annotation. Then, the user selects a role to add a new slot to the feature arguments. Selecting a span then automatically creates the slot filler annotation (SemArg) and completes the link. Visualisation. The links between slot owner and slot fillers are visualised as arcs in the colour of the slot owner. A special dashed style is used to distinguish them from other types of relations. Agreement. With the introduction of slot features, we completely reimplemented the calculation of inter-annotator agreement. A single feature of a layer can be selected for agreement calculation (i.e. the word sense feature or the semantic argument feature). The first step in the calculation of agreement identifies which annotations actually need to be compared to each other. To address this, we first derive coordinates from the annotated data (Table 1). For regular span features, a coordinate is the tuple of <layer, feature, char offsets(begin-end)> and for relation features <layer, feature, char offsets(srcbegin-srcend, tgtbegin-tgtend)>. Then, agreement is calculated by comparing the labels assigned by each user for each coordinate. We implemented two modes of calculating agreement for slot features. The first mode role-as-label calculates agreement based on the slot s role. Consider asked as the semantic predicate and Fred as a slot filler. Two annotators agree if they assigned the same role to Fred, e.g. Addressee. It is a disagreement if one annotator assigns the role Speaker to Fred and another assigns the role Addressee (cf. Table 1 d/e). In this mode, the slot filler is part of the annotation coordinates (cf. Table 1 column extra). In the second mode, target-as-label, agreement is calculated based on the slot filler. Two annotators agree if they use the same slot filler for a slot with a given role. So consider again asked as the semantic predicate and Addressee as the role to be filled. In this case, there is an agreement if two annotators choose Fred as a slot filler and a disagreement if one annotator fills the slot with Fred and the other one with I. In this mode, the role label is part of the annotation coordinates (cf. Table 1 column extra). Missing annotations are very common with slot features. For example, cases d) and e) in Table 1 differ in their coordinates, namely in the extra column which is used to discriminate between slots, but not in 2 The layer names proposed here are meant as examples motivated by our use case. In WebAnno, the name of the layers and their features can in general be freely defined by the annotation project s creator.

5 Feature type User Coordinates Label layer feature char offsets extra a) string on span A Lemma value (asked) ask b) string on relation* A Dependency DependencyType (I), (asked) nsubj c) slot (role-as-label) A SemPred arguments (asked) 0-4 (Fred) Addressee d) slot (target-as-label) A SemPred arguments (asked) Addressee 0-4 (Fred) e) slot (target-as-label)* B SemPred arguments (asked) Speaker 0-4 (Fred) Table 1: Examples of coordinates and labels for different layer and feature types in Figure 1 (except *). Figure 1: Cross-sentence annotation/curation with slot features and constraint rules. their label. This makes them two entirely different annotations which the respective other user simply did not annotate rather than a case of disagreement. To cope with such missing annotations, Krippendorff s Alpha (Krippendorff, 1970) was added to the list of supported agreement measures. This measure is conceptually able to handle missing annotations. The level of detail on agreement calculation provided by WebAnno 2 proved to be insufficient. It is often not clear how the agreement values related to the actual annotations. Thus, while reimplementing the agreement calculation to support for slot features into the agreement mechanism, we also greatly enhanced the level of detail in the UI to explain which data was (not) used for calculation and why it may not have been possible to calculate agreement. Finally, we added the ability to export raw agreement data as CSV files for detailed inspection and external agreement calculation. 3.2 Handling Rich Tagsets via Constraints Semantic annotation tasks typically operate with a very large set of labels. While a typical part-of-speech tagset contains around 50 tags, FrameNet contains over 800 frame labels. We currently observe over 1,000 role labels and more than 8,000 frame-role combinations in the FrameNet lexicon. However, given a specific lemma, there is only a small set of applicable frame labels that it evokes (e.g. ask can evoke the frames Questioning and Request). We introduce a generic mechanism that allows formalising such interdependencies as constraint rules.

6 Listing 1: Example FrameNet-style constraints SemPred { / / Rule v a l u e = ask > s e n s e I d = Q u e s t i o n i n g s e n s e I d = Request s e n s e I d = XXX ; / /.. o t h e r lemmata / / Rule 2 s e n s e I d = Q u e s t i o n i n g > / / c o r e r o l e s arguments. r o l e = Addressee (! ) arguments. r o l e = Message (! ) arguments. r o l e = Speaker (! ) / / non c o r e r o l e s arguments. r o l e = Time arguments. r o l e = I t e r a t i o n s ; / /.. o t h e r s e n s e s } Dependency { governor. pos. v a l u e = NN & dependent. pos. v a l u e = DET > dependencytype = DET ; } Listing 2: Grammar for constraint rules / / Basic s t r u c t u r e / / R e s t r i c t i o n s <f i l e > : : = <import> <scope> <actions> : : = <action> <scope> : : = <shortlayername> { <r u l e s e t> } <action> <actions> <r u l e s e t > : : = <r u l e > <a c t i o n> : : = <a c t i o n P a t h> = <value> <import> : : = i m p o r t <q u a l i f i e d L a y e r N a m e> ( ( <f l a g s> ) ) as <shortlayername> <actionpath> : : = <featurename> <r u l e> : : = <conds> > <a c t i o n s> ; <a c t i o n P a t h>. <featurename> / / Conditions <flags> : : =! / / core r o l e <conds> : : = <cond> <cond> & <conds> <cond> : : = <path> = <value> <path> : : = <featurename> <s t e p>. <path> <s t e p> : : = <featurename> <l a y e r S e l e c t o r > <l a y e r S e l e c t o r > : : = <l a y e r O p e r a t o r >? <shortlayername> <l a y e r O p e r a t o r> : : / / s e l e c t a n n o t a t i o n i n l a y e r X Rules are typically defined over annotations and their labels, but may also access the annotated text. A rule consists of a left-hand side expressing the conditions under which the rule applies and a right-hand side expressing the effect of the rule. A logical conjunction of multiple conditions can be specified. If multiple effects for a rule are specified, they are considered to be alternatives. The mechanism does not limit the annotator s choice. It only reorders the list of labels the users can choose from, showing first those labels covered by constraints in a bold font, then the rest of the labels. While constraint rules can be written manually, they can also be generated automatically from machine-readable schemes. We provide suitable conversions from FrameNet and GermaNet (cf. Section 4). Listing 1 shows example constraint rules for a FrameNet-like annotation style. These apply to annotations on the layer SemPred. Rule 1 reorders the choice of sense IDs for the SemPred annotation: if a Lemma annotation with value ask exists at the same character offsets as the SemPred (Figure 1 1 ), then the sense IDs Questioning, Request, and XXX (other/unknown) should be offered as top labels in the sense ID dropdown selection (Figure 1 2 ). Similar rules for other lemmata would normally follow. Rule 2 in Listing 1 illustrates constraints on the roles of slot features. If the senseid Questioning is chosen, Addressee, Message, Speaker, Time, and Iterations are offered as top labels in the argument role dropdown box (Figure 1 3 ). Slots for core roles (marked with! ) are added automatically while slots for other roles need to be added manually. The constraint mechanism is generic. It uses a path-like notation to navigate along features pointing to other annotations (e.g. governor.pos.value), as illustrated in the dependency rule. Within conditions, a path component may also meaning select all annotations of the specified layer at the same offsets as the current annotation. We use this in Listing 1 to access information from the Lemma layer. Listing 2 shows the full constraint rule grammar in a BNF-like notation. An indicator (Figure 1 4 ) next to a feature in the editor panel shows the status of constraints. The indicator is green if there are matching rules (e.g. there is a Lemma annotation with the value ask and there is a rule with a ask ). It is orange if there are rules that affect the feature but none matches (e.g. there is at least one rule with a condition on the value of a Lemma but none that matches on the specific given value for the present annotation). Finally, it is red if there are rules, but the feature is not restricted by a tagset. In the latter case, the rules have no effect since without a tagset,

7 there are no labels to be reordered. 3.3 Redesigned Annotation User Interface The annotation user interfaces of WebAnno 2 and brat use a pop-up dialog whenever an annotation is created or modified. This was perceived as a major impediment to efficient annotation by our annotation team, not only because the dialog obscures the underlying annotation interface, but mainly because the dialog needs to be closed by an explicit user action. Hence, we implemented a dialog-less annotation user interface, similar to the one in the Anafora tool. Along with the new interface, additional user-expressed needs for semantic annotation were addressed. These include better support for keyboard-based annotation, the ability to create zero-length annotations to represent slot fillers not realised in the text, the display of descriptive tooltips explaining semantic tags, the indication of tokens transitively covered by dependency relations, and the ability for the curation of cross-sentence annotations. Dialog-less annotation. The UI elements used for editing the annotation feature values have been moved from a dialog (cf. brat) to a permanently visible detail panel on the side of the screen (Figure 1 5 ). Selecting a span of text or drawing an arc now immediately creates an annotation on the active layer and loads its features in the detail panel. Changes to the feature values are immediately reflected in the annotation view. For tasks that require repeatedly annotating the same feature values (e.g. annotating persons in a named entity task), the option save feature can be enabled for each feature individually. New annotations are then created with the same feature values as the previously selected annotation. Keyboard-based annotation. Keyboard-based annotation can significantly speed up the annotation process. For annotation layers with a single tagset-controlled feature (e.g. part-of-speech), we introduce a new keyboard-based forward annotation mode (Figure 1 6 ). In this mode, pressing a key selects the first tag starting with the respective character and pressing it again selects the next. As annotators memorise the number of key presses for the tags, we expect their efficiency to improve. Once a tag is chosen, pressing space automatically selects the next token for annotation. Zero-width annotations. Zero-width span annotations can now be created (at arbitrary positions in the text), e.g. to instantiate semantic predicates or arguments that are not realised in the text. Tooltips. Tag descriptions are now displayed as tooltips (Figure 1 7 ) during annotation. In particular for rich semantic tagsets, this is an important improvement. The tag descriptions are part of the tagset definitions which can be edited via the user interface in the project settings. Alternatively, they can be externally generated and imported as JSON files. For the purpose of our annotation project, we automatically derived tagset files from VerbNet and GermaNet. Relation yield highlighting. Moving the mouse cursor over a span with outgoing relations now displays a tooltip showing the yield of the relation (Figure 1 8 ). E.g. for dependency relations, the yield represents all the tokens subsumed by the respective dependency node. This feature facilitates the coordination of syntactic and semantic annotations when using syntactic heads as slot fillers. In the annotation project, it was determined that dependency-based annotation, enhanced with highlighting of the dependency relation yield proved superior to span-based annotation, which is more error prone. Cross-sentence curation. We extended the curation component of WebAnno to support the adjudication of multiple annotations for individual sentences. This was necessary in order to curate cross-sentence semantic relations, e.g. to use parts of a previous sentence as slot fillers. An example of such a case is provided in Figure 1 9. Curators can now define the number of sentences to visualise and curate at a time. Additionally, an indicator is displayed for any relation annotations that are not rendered because one of their endpoints is realised outside of the window of currently displayed sentences (Figure 1 10 ). 4 Experiences and Evaluation 4.1 Annotation Study Our new tool was tested in a large-scale semantic annotation study on German standard as well as nonstandard texts (cf. Mújdricza-Maydt et al. (2016)). An annotation team performed a combination of word sense and semantic role annotation based on GermaNet senses and VerbNet-style role labels. As basis for the annotation, the new slot features were used to model predicate-argument structures. Compared to

8 Doc a) WebAnno 2 b) WebAnno 3 c) WebAnno 3 no constraints constraints 1 14:58 10:07 08: :21 10:29 08: :19 04:16 03: :40 08:22 07:22 All 43:18 33:14 28:22 Table 2: Annotation times in minutes:seconds for WebAnno 2 and 3 with and without constraints. earlier experiences with predicate-argument annotation using relation layers in WebAnno 2, the annotation team reported improved comfort and efficiency with the new interface. Having the annotation detail panel visible all the time and no longer being forced to constantly switch between annotation dialog and annotation view was perceived as an important factor for this improvement. This study also made extensive use of constraint rules and relied heavily on the new tooltips explaining senses and roles to the annotators. In a pilot study for the project, different strategies of selecting slot fillers were examined. The strategy that was eventually selected relied on pre-annotated dependency relations and required the annotator to select the syntactic head of a phrase to be used as a slot filler (e.g. house instead of the house). The new ability to display the yield of the dependency relations proved to be instrumental in this task to locate the correct head words. To facilitate SRL annotation projects for others, we provide in addition to the tool itself 1) the setup for our annotation study as a demo project; 2) configuration files for different predicate sense and semantic role labeling frameworks (VerbNet, FrameNet, and PropBank), including their respective tagsets and constraint rules. 4.2 Comparative Evaluation After the development has been finalised, we conducted a comparative evaluation study in order to measure whether our efforts in tool engineering translate into annotation time speedups, which directly influences annotation cost. For this, we chose the following setup: an annotator was presented with a static display of gold standard predicate-argument annotations like the ones shown in Figure 1. In this way, we minimise fluctuations caused by cognitive load for choosing the right labels, simulating a maximally trained annotator. The annotator had ample experience with both the annotation tools, but not with the specific settings. Thus, for each setting, the annotator became familiar with the setup on a small training document. We conducted this annotation in three different settings: a) using WebAnno 2 (version 2.3.1), modelling the annotation task as outlined in Section 2.1, b) using WebAnno 3 without constraints and c) using WebAnno 3 with constraints for rules that re-order semantic predicates according to the lemma value, as exemplified in (Figure 1 4 ). To avoid interpersonal differences in annotation speed, the same annotator worked on all three settings. The data set on which time was measured included four documents with a total of 64 semantic predicates, filling a total of 115 slots with overall 94 semantic arguments (SemArg in Figure 1). Note that the same semantic argument can fill slots of different predicates. Table 2 shows the times in minutes and seconds measured for the four documents. The improvements of the user interface in setting b) already results in a speedup of about 23% less time compared to WebAnno 2. In combination with constraints, a speedup of over 34% is reached in setting c), meaning that the same annotator can produce 50% more annotations in the same time. The annotator reported that she needed to perform considerably fewer interactions in setting c). 5 Conclusion We have introduced WebAnno 3, 3 the third major release of WebAnno. The new version introduces major improvements that significantly go beyond the state-of-the-art: they enable the annotation of semantic structure and the handling of rich semantic tagsets. The new version was developed in close coordination with an associated annotation project performing combined WSD and SRL annotation on German texts. Moreover, the tool can be used for many non-linguistic annotation tasks in various domains, 3

9 e.g. the annotation of metaphors and ambiguities (Digital Humanities), skills and qualifications (Human Resources), sentiment in product reviews (Advertisement/Marketing), etc. A comparative evaluation showed considerable speedups over previous tools and validates both, our adaptation to the user interface and the incorporation of the constraint language to facilitate the annotation of large, interdependent tagsets. Acknowledgments The work presented in this paper was funded by a German BMBF grant to the CLARIN-D project, by the German Research Foundation as part of the Research Training Group Adaptive Preparation of Information from Heterogeneous Sources (AIPHES) under grant No. GRK 1994/1, and by the Volkswagen Foundation as part of the Lichtenberg-Professorship Program under grant No. I/ We also thank Aakash Sharma and Sarah Holschneider for their active support in the development of WebAnno, the documentation of the tool, and for conducting the comparative evaluation study. References Collin F. Baker, Charles J. Fillmore, and John B. Lowe The Berkeley FrameNet project. In Proc. of COLING-ACL 1998, pages 86 90, Montreal, Canada. Aljoscha Burchardt, Katrin Erk, Anette Frank, Andreas Kowalski, and Sebastian Padó SALTO A Versatile Multi-Level Annotation Tool. In Proc. of LREC 2006, pages , Genoa, Italy. Aljoscha Burchardt, Katrin Erk, Anette Frank, Andrea Kowalski, Sebastian Padó, and Manfred Pinkal Using FrameNet for the Semantic Analysis of German: Annotation, Representation and Automation. In Hans C. Boas, editor, Multilingual FrameNets Practice and Applications, pages Mouton de Gruyter, Berlin, Germany. Wei-Te Chen and Will Styler Anafora: A web-based general purpose annotation tool. In Proc. of the NAACL HLT 2013 Demonstration Session, pages 14 19, Atlanta, GA, USA. Jinho Choi, Claire Bonial, and Martha Palmer Multilingual PropBank annotation tools: Cornerstone and Jubilee. In Proc. of the NAACL HLT 2010 Demonstration Session, pages 13 16, Los Angeles, CA, USA. Birgit Hamp and Helmut Feldweg GermaNet - a lexical-semantic net for German. In Automatic Information Extraction and Building of Lexical Semantic Resources for NLP Applications, pages 9 15, Madrid, Spain. Karin Kipper Schuler VerbNet: A Broad-coverage, Comprehensive Verb Lexicon. Ph.D. thesis, University of Pennsylvania, Philadelphia, PA, USA. Klaus Krippendorff Estimating the reliability, systematic error and random error of interval data. Educational. and Psych. Measurement, 30(1): Éva Mújdricza-Maydt, Silvana Hartmann, Iryna Gurevych, and Anette Frank Combining Semantic Annotation of Word Sense & Semantic Roles: A Novel Annotation Scheme for VerbNet Roles on German Language Data. In Proc. of LREC 2016, pages Martha Palmer, Daniel Gildea, and Paul Kingsbury The Proposition Bank: An annotated corpus of semantic roles. Computational Linguistics, 31(1): Pontus Stenetorp, Sampo Pyysalo, Goran Topić, Tomoko Ohta, Sophia Ananiadou, and Jun ichi Tsujii brat: a web-based tool for NLP-assisted text annotation. In Proc. of EACL 2012, pages , Avignon, France. Seid Muhie Yimam, Iryna Gurevych, Richard Eckart de Castilho, and Chris Biemann WebAnno: A flexible, web-based and visually supported system for distributed annotations. In Proc. of ACL 2013: System Demonstrations, pages 1 6, Sofia, Bulgaria. Seid Muhie Yimam, Richard Eckart de Castilho, Iryna Gurevych, and Chris Biemann Automatic annotation suggestions and custom annotation layers in WebAnno. In Proc. of ACL 2014: System Demonstrations, pages 91 96, Baltimore, MD, USA.

WebAnno: a flexible, web-based annotation tool for CLARIN

WebAnno: a flexible, web-based annotation tool for CLARIN WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike

More information

Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee

Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee Propbank Instance Annotation Guidelines Using a Dedicated Editor, Jubilee Jinho D. Choi, Claire Bonial, Martha Palmer University of Colorado at Boulder Boulder, CO 80302, UA choijd@colorado.edu, bonial@colorado.edu,

More information

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed

Let s get parsing! Each component processes the Doc object, then passes it on. doc.is_parsed attribute checks whether a Doc object has been parsed Let s get parsing! SpaCy default model includes tagger, parser and entity recognizer nlp = spacy.load('en ) tells spacy to use "en" with ["tagger", "parser", "ner"] Each component processes the Doc object,

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation

Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation Assessing the benefits of partial automatic pre-labeling for frame-semantic annotation Ines Rehbein and Josef Ruppenhofer and Caroline Sporleder Computational Linguistics Saarland University {rehbein,josefr,csporled}@coli.uni-sb.de

More information

Langforia: Language Pipelines for Annotating Large Collections of Documents

Langforia: Language Pipelines for Annotating Large Collections of Documents Langforia: Language Pipelines for Annotating Large Collections of Documents Marcus Klang Lund University Department of Computer Science Lund, Sweden Marcus.Klang@cs.lth.se Pierre Nugues Lund University

More information

Generating FrameNets of various granularities: The FrameNet Transformer

Generating FrameNets of various granularities: The FrameNet Transformer Generating FrameNets of various granularities: The FrameNet Transformer Josef Ruppenhofer, Jonas Sunde, & Manfred Pinkal Saarland University LREC, May 2010 Ruppenhofer, Sunde, Pinkal (Saarland U.) Generating

More information

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame

structure of the presentation Frame Semantics knowledge-representation in larger-scale structures the concept of frame structure of the presentation Frame Semantics semantic characterisation of situations or states of affairs 1. introduction (partially taken from a presentation of Markus Egg): i. what is a frame supposed

More information

Jubilee: Propbank Instance Editor Guideline (Version 2.1)

Jubilee: Propbank Instance Editor Guideline (Version 2.1) Jubilee: Propbank Instance Editor Guideline (Version 2.1) Jinho D. Choi choijd@colorado.edu Claire Bonial bonial@colorado.edu Martha Palmer mpalmer@colorado.edu Center for Computational Language and EducAtion

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

Frame Semantic Structure Extraction

Frame Semantic Structure Extraction Frame Semantic Structure Extraction Organizing team: Collin Baker, Michael Ellsworth (International Computer Science Institute, Berkeley), Katrin Erk(U Texas, Austin) October 4, 2006 1 Description of task

More information

UIMA-based Annotation Type System for a Text Mining Architecture

UIMA-based Annotation Type System for a Text Mining Architecture UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge Discover hidden information from your texts! Information overload is a well known issue in the knowledge industry. At the same time most of this information becomes available in natural language which

More information

XML Support for Annotated Language Resources

XML Support for Annotated Language Resources XML Support for Annotated Language Resources Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York USA ide@cs.vassar.edu Laurent Romary Equipe Langue et Dialogue LORIA/CNRS Vandoeuvre-lès-Nancy,

More information

Find Problems before They Find You with AnnotatorPro s Monitoring Functionalities

Find Problems before They Find You with AnnotatorPro s Monitoring Functionalities Find Problems before They Find You with AnnotatorPro s Monitoring Functionalities Mohammed R. H. Qwaider, Anne-Lyse Minard, Manuela Speranza, Bernardo Magnini Fondazione Bruno Kessler, Trento, Italy {qwaider,minard,manspera,magnini}@fbk.eu

More information

SETS: Scalable and Efficient Tree Search in Dependency Graphs

SETS: Scalable and Efficient Tree Search in Dependency Graphs SETS: Scalable and Efficient Tree Search in Dependency Graphs Juhani Luotolahti 1, Jenna Kanerva 1,2, Sampo Pyysalo 1 and Filip Ginter 1 1 Department of Information Technology 2 University of Turku Graduate

More information

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Building and Annotating Corpora of Collaborative Authoring in Wikipedia Building and Annotating Corpora of Collaborative Authoring in Wikipedia Johannes Daxenberger, Oliver Ferschke and Iryna Gurevych Workshop: Building Corpora of Computer-Mediated Communication: Issues, Challenges,

More information

MDSWriter: Annotation tool for creating high-quality. multi-document summarization corpora.

MDSWriter: Annotation tool for creating high-quality. multi-document summarization corpora. MDSWriter: Annotation tool for creating high-quality multi-document summarization corpora Christian M. Meyer, Darina Benikova, Margot Mieskes, and Iryna Gurevych Research Training Group AIPHES Ubiquitous

More information

Annota&ng corpora with WebAnno

Annota&ng corpora with WebAnno Annota&ng corpora with WebAnno Darja Fišer Filozofska fakulteta Univerze v Ljubljani Zagreb, 4. 12. 2015 1. Lecture tool characterisbcs examples of use accessing the tool Outline 2. Demo annotabon curabon

More information

SAWT: Sequence Annotation Web Tool

SAWT: Sequence Annotation Web Tool SAWT: Sequence Annotation Web Tool Younes Samih and Wolfgang Maier and Laura Kallmeyer Heinrich Heine University Department of Computational Linguistics Universitätsstr. 1, 40225 Düsseldorf, Germany {samih,maierwo,kallmeyer}@phil.hhu.de

More information

Text Mining for Software Engineering

Text Mining for Software Engineering Text Mining for Software Engineering Faculty of Informatics Institute for Program Structures and Data Organization (IPD) Universität Karlsruhe (TH), Germany Department of Computer Science and Software

More information

Cornerstone: Propbank Frameset Editor Guideline (Version 1.3)

Cornerstone: Propbank Frameset Editor Guideline (Version 1.3) Cornerstone: Propbank Frameset Editor Guideline (Version 1.3) Jinho D. Choi choijd@colorado.edu Claire Bonial bonial@colorado.edu Martha Palmer mpalmer@colorado.edu Center for Computational Language and

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T The Muc7 T Corpus Katrin Tomanek and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany {katrin.tomanek udo.hahn}@uni-jena.de 1 Introduction

More information

Semantics Isn t Easy Thoughts on the Way Forward

Semantics Isn t Easy Thoughts on the Way Forward Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University

More information

Knowledge-based authoring tools (KBATs) for graphics in documents

Knowledge-based authoring tools (KBATs) for graphics in documents Knowledge-based authoring tools (KBATs) for graphics in documents Robert P. Futrelle Biological Knowledge Laboratory College of Computer Science 161 Cullinane Hall Northeastern University Boston, MA 02115

More information

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus

Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center

More information

The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation

The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation The INCEpTION Platform: Machine-Assisted and Knowledge-Oriented Interactive Annotation Jan-Christoph Klie Michael Bugert Beto Boullosa Richard Eckart de Castilho Iryna Gurevych Ubiquitous Knowledge Processing

More information

Background and Context for CLASP. Nancy Ide, Vassar College

Background and Context for CLASP. Nancy Ide, Vassar College Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding

More information

Programming the Semantic Web

Programming the Semantic Web Programming the Semantic Web Steffen Staab, Stefan Scheglmann, Martin Leinberger, Thomas Gottron Institute for Web Science and Technologies, University of Koblenz-Landau, Germany Abstract. The Semantic

More information

MASC: The Manually Annotated Sub-Corpus of American English. Nancy Ide*, Collin Baker**, Christiane Fellbaum, Charles Fillmore**, Rebecca Passonneau

MASC: The Manually Annotated Sub-Corpus of American English. Nancy Ide*, Collin Baker**, Christiane Fellbaum, Charles Fillmore**, Rebecca Passonneau MASC: The Manually Annotated Sub-Corpus of American English Nancy Ide*, Collin Baker**, Christiane Fellbaum, Charles Fillmore**, Rebecca Passonneau *Vassar College Poughkeepsie, New York USA **International

More information

Aid to spatial navigation within a UIMA annotation index

Aid to spatial navigation within a UIMA annotation index Aid to spatial navigation within a UIMA annotation index Nicolas Hernandez LINA CNRS UMR 6241 University de Nantes Darmstadt, 3rd UIMA@GSCL Workshop, September 23, 2013 N. Hernandez Spatial navigation

More information

Getting Started with DKPro Agreement

Getting Started with DKPro Agreement Getting Started with DKPro Agreement Christian M. Meyer, Margot Mieskes, Christian Stab and Iryna Gurevych: DKPro Agreement: An Open-Source Java Library for Measuring Inter- Rater Agreement, in: Proceedings

More information

For many years, the creation and dissemination

For many years, the creation and dissemination Standards in Industry John R. Smith IBM The MPEG Open Access Application Format Florian Schreiner, Klaus Diepold, and Mohamed Abo El-Fotouh Technische Universität München Taehyun Kim Sungkyunkwan University

More information

On maximum spanning DAG algorithms for semantic DAG parsing

On maximum spanning DAG algorithms for semantic DAG parsing On maximum spanning DAG algorithms for semantic DAG parsing Natalie Schluter Department of Computer Science School of Technology, Malmö University Malmö, Sweden natalie.schluter@mah.se Abstract Consideration

More information

Army Research Laboratory

Army Research Laboratory Army Research Laboratory Arabic Natural Language Processing System Code Library by Stephen C. Tratz ARL-TN-0609 June 2014 Approved for public release; distribution is unlimited. NOTICES Disclaimers The

More information

ANC2Go: A Web Application for Customized Corpus Creation

ANC2Go: A Web Application for Customized Corpus Creation ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu

More information

Part 5. Verification and Validation

Part 5. Verification and Validation Software Engineering Part 5. Verification and Validation - Verification and Validation - Software Testing Ver. 1.7 This lecture note is based on materials from Ian Sommerville 2006. Anyone can use this

More information

Sustainability of Text-Technological Resources

Sustainability of Text-Technological Resources Sustainability of Text-Technological Resources Maik Stührenberg, Michael Beißwenger, Kai-Uwe Kühnberger, Harald Lüngen, Alexander Mehler, Dieter Metzing, Uwe Mönnich Research Group Text-Technological Overview

More information

Towards Corpus Annotation Standards The MATE Workbench 1

Towards Corpus Annotation Standards The MATE Workbench 1 Towards Corpus Annotation Standards The MATE Workbench 1 Laila Dybkjær, Niels Ole Bernsen Natural Interactive Systems Laboratory Science Park 10, 5230 Odense M, Denmark E-post: laila@nis.sdu.dk, nob@nis.sdu.dk

More information

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown

Semantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown Semantic and Multimodal Annotation CLARA University of Copenhagen 15-26 August 2011 Susan Windisch Brown 2 Program: Monday Big picture Coffee break Lexical ambiguity and word sense annotation Lunch break

More information

Transition-Based Dependency Parsing with Stack Long Short-Term Memory

Transition-Based Dependency Parsing with Stack Long Short-Term Memory Transition-Based Dependency Parsing with Stack Long Short-Term Memory Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, Noah A. Smith Association for Computational Linguistics (ACL), 2015 Presented

More information

Meaning Banking and Beyond

Meaning Banking and Beyond Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis

More information

Online Service for Polish Dependency Parsing and Results Visualisation

Online Service for Polish Dependency Parsing and Results Visualisation Online Service for Polish Dependency Parsing and Results Visualisation Alina Wróblewska and Piotr Sikora Institute of Computer Science, Polish Academy of Sciences, Warsaw, Poland alina@ipipan.waw.pl,piotr.sikora@student.uw.edu.pl

More information

Overview of Annotation Creation: Processes & Tools

Overview of Annotation Creation: Processes & Tools To appear in James Pustejovsky & Nancy Ide (2016) Handbook of Linguistic Annotation. New York: Springer Overview of Annotation Creation: Processes & Tools Mark A. Finlayson and Tomaž Erjavec Abstract Creating

More information

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction

Adaptable and Adaptive Web Information Systems. Lecture 1: Introduction Adaptable and Adaptive Web Information Systems School of Computer Science and Information Systems Birkbeck College University of London Lecture 1: Introduction George Magoulas gmagoulas@dcs.bbk.ac.uk October

More information

Annotating Spatio-Temporal Information in Documents

Annotating Spatio-Temporal Information in Documents Annotating Spatio-Temporal Information in Documents Jannik Strötgen University of Heidelberg Institute of Computer Science Database Systems Research Group http://dbs.ifi.uni-heidelberg.de stroetgen@uni-hd.de

More information

An Extension of the TIGER Query Language for Treebanks with Frame Semantics Annotation

An Extension of the TIGER Query Language for Treebanks with Frame Semantics Annotation An Extension of the TIGER Query Language for Treebanks with Frame Semantics Annotation Thesis Presentation for M.Sc. LST Torsten Marek Saarland University February 9th, 2009

More information

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017

Tokenization and Sentence Segmentation. Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Tokenization and Sentence Segmentation Yan Shao Department of Linguistics and Philology, Uppsala University 29 March 2017 Outline 1 Tokenization Introduction Exercise Evaluation Summary 2 Sentence segmentation

More information

ICD Wiki Framework for Enabling Semantic Web Service Definition and Orchestration

ICD Wiki Framework for Enabling Semantic Web Service Definition and Orchestration ICD Wiki Framework for Enabling Semantic Web Service Definition and Orchestration Dean Brown, Dominick Profico Lockheed Martin, IS&GS, Valley Forge, PA Abstract As Net-Centric enterprises grow, the desire

More information

A Multilingual Social Media Linguistic Corpus

A Multilingual Social Media Linguistic Corpus A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th

More information

Deliverable D1.4 Report Describing Integration Strategies and Experiments

Deliverable D1.4 Report Describing Integration Strategies and Experiments DEEPTHOUGHT Hybrid Deep and Shallow Methods for Knowledge-Intensive Information Extraction Deliverable D1.4 Report Describing Integration Strategies and Experiments The Consortium October 2004 Report Describing

More information

Annotation Science From Theory to Practice and Use Introduction A bit of history

Annotation Science From Theory to Practice and Use Introduction A bit of history Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora

More information

Specifying and Annotating Reduced Argument Span Via QA-SRL

Specifying and Annotating Reduced Argument Span Via QA-SRL Specifying and Annotating Reduced Argument Span Via QA-SRL Gabriel Stanovsky Meni Adler Ido Dagan Computer Science Department, Bar-Ilan University {gabriel.satanovsky,meni.adler}@gmail.com dagan@cs.biu.ac.il

More information

Describing Computer Languages

Describing Computer Languages Markus Scheidgen Describing Computer Languages Meta-languages to describe languages, and meta-tools to automatically create language tools Doctoral Thesis August 10, 2008 Humboldt-Universität zu Berlin

More information

This document is a preview generated by EVS

This document is a preview generated by EVS INTERNATIONAL STANDARD ISO 24611 First edition 2012-11-01 Language resource management Morpho-syntactic annotation framework (MAF) Gestion des ressources langagières Cadre d'annotation morphosyntaxique

More information

The Goal of this Document. Where to Start?

The Goal of this Document. Where to Start? A QUICK INTRODUCTION TO THE SEMILAR APPLICATION Mihai Lintean, Rajendra Banjade, and Vasile Rus vrus@memphis.edu linteam@gmail.com rbanjade@memphis.edu The Goal of this Document This document introduce

More information

WASA: A Web Application for Sequence Annotation

WASA: A Web Application for Sequence Annotation WASA: A Web Application for Sequence Annotation Fahad AlGhamdi, and Mona Diab Department of Computer Science The George Washington University Washington, DC {fghamdi, mtdiab}@gwu.edu Abstract Data annotation

More information

MICROSOFT WORD. Table of Contents. What is MSWord? Features LINC TWO

MICROSOFT WORD. Table of Contents. What is MSWord? Features LINC TWO Table of Contents What is MSWord? MS Word is a word-processing program that allows users to create, edit, and enhance text in a variety of formats. Word is a powerful word-processor with sophisticated

More information

A Short Introduction to CATMA

A Short Introduction to CATMA A Short Introduction to CATMA Outline: I. Getting Started II. Analyzing Texts - Search Queries in CATMA III. Annotating Texts (collaboratively) with CATMA IV. Further Search Queries: Analyze Your Annotations

More information

Natural Language Requirements

Natural Language Requirements Natural Language Requirements Software Verification and Validation Laboratory Requirement Elaboration Heuristic Domain Model» Requirement Relationship Natural Language is elaborated via Requirement application

More information

GernEdiT: A Graphical Tool for GermaNet Development

GernEdiT: A Graphical Tool for GermaNet Development GernEdiT: A Graphical Tool for GermaNet Development Verena Henrich University of Tübingen Tübingen, Germany. verena.henrich@unituebingen.de Erhard Hinrichs University of Tübingen Tübingen, Germany. erhard.hinrichs@unituebingen.de

More information

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio.

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio. IBM Watson Application Developer Workshop Lab02 Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio January 2017 Duration: 60 minutes Prepared by Víctor L. Fandiño

More information

Event Extraction as Dependency Parsing for BioNLP 2011

Event Extraction as Dependency Parsing for BioNLP 2011 Event Extraction as Dependency Parsing for BioNLP 2011 David McClosky, Mihai Surdeanu, and Christopher D. Manning Department of Computer Science Stanford University Stanford, CA 94305 {mcclosky,mihais,manning}@stanford.edu

More information

from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington

from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington Vol. 4 (2010), pp. 60-65 http://nflrc.hawaii.edu/ldc/ http://hdl.handle.net/10125/4467 TypeCraft from Pavel Mihaylov and Dorothee Beermann Reviewed by Sc o t t Fa r r a r, University of Washington 1. OVERVIEW.

More information

Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF

Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF Arne Neumann 1 Nancy Ide 2 Manfred Stede 1 1 EB Cognitive Science and SFB 632 University of Potsdam 2 Department of Computer

More information

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions)

CIS 1.5 Course Objectives. a. Understand the concept of a program (i.e., a computer following a series of instructions) By the end of this course, students should CIS 1.5 Course Objectives a. Understand the concept of a program (i.e., a computer following a series of instructions) b. Understand the concept of a variable

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Visually Interacting with a Knowledge Base

Visually Interacting with a Knowledge Base Visually Interacting with a Knowledge Base Using Frames, Logic, and Propositional Graphs With Extended Background Material Daniel R. Schlegel and Stuart C. Shapiro Department of Computer Science and Engineering

More information

Deliverable D TQ Test Suite

Deliverable D TQ Test Suite This document is part of the Coordination and Support Action Preparation and Launch of a Large-scale Action for Quality Translation Technology (QTLaunchPad). This project has received funding from the

More information

VANNOTATOR: a Gesture-driven Annotation Framework for Linguistic and Multimodal Annotation

VANNOTATOR: a Gesture-driven Annotation Framework for Linguistic and Multimodal Annotation VANNOTATOR: a Gesture-driven Annotation Framework for Linguistic and Multimodal Annotation Christian Spiekermann, Giuseppe Abrami, Alexander Mehler Text Technology Lab Goethe-University Frankfurt s2717197@stud.uni-frankfurt.de,

More information

NLP - Based Expert System for Database Design and Development

NLP - Based Expert System for Database Design and Development NLP - Based Expert System for Database Design and Development U. Leelarathna 1, G. Ranasinghe 1, N. Wimalasena 1, D. Weerasinghe 1, A. Karunananda 2 Faculty of Information Technology, University of Moratuwa,

More information

Textual Emigration Analysis

Textual Emigration Analysis Textual Emigration Analysis Andre Blessing and Jonas Kuhn IMS - Universität Stuttgart, Germany clarin@ims.uni-stuttgart.de Abstract We present a web-based application which is called TEA (Textual Emigration

More information

GUIDELINES FOR SPEECH- ACCESSIBLE HTML FOR DRAGON NATURALLYSPEAKING AND DRAGON MEDICAL WHITE PAPER

GUIDELINES FOR SPEECH- ACCESSIBLE HTML FOR DRAGON NATURALLYSPEAKING AND DRAGON MEDICAL WHITE PAPER GUIDELINES FOR SPEECH- ACCESSIBLE HTML FOR DRAGON NATURALLYSPEAKING AND DRAGON MEDICAL WHITE PAPER CONTENTS Overview... 2 General Requirements... 3 Dictation... 3 Elements Problematic For Diction... 4

More information

D4.6 Data Value Chain Database v2

D4.6 Data Value Chain Database v2 D4.6 Data Value Chain Database v2 Coordinator: Fabrizio Orlandi (Fraunhofer) With contributions from: Isaiah Mulang Onando (Fraunhofer), Luis-Daniel Ibáñez (SOTON) Reviewer: Ryan Goodman (ODI) Deliverable

More information

Applying Model Intelligence Frameworks for Deployment Problem in Real-Time and Embedded Systems

Applying Model Intelligence Frameworks for Deployment Problem in Real-Time and Embedded Systems Applying Model Intelligence Frameworks for Deployment Problem in Real-Time and Embedded Systems Andrey Nechypurenko 1, Egon Wuchner 1, Jules White 2, and Douglas C. Schmidt 2 1 Siemens AG, Corporate Technology

More information

An Adaptive Framework for Named Entity Combination

An Adaptive Framework for Named Entity Combination An Adaptive Framework for Named Entity Combination Bogdan Sacaleanu 1, Günter Neumann 2 1 IMC AG, 2 DFKI GmbH 1 New Business Department, 2 Language Technology Department Saarbrücken, Germany E-mail: Bogdan.Sacaleanu@im-c.de,

More information

2 nd UML 2 Semantics Symposium: Formal Semantics for UML

2 nd UML 2 Semantics Symposium: Formal Semantics for UML 2 nd UML 2 Semantics Symposium: Formal Semantics for UML Manfred Broy 1, Michelle L. Crane 2, Juergen Dingel 2, Alan Hartman 3, Bernhard Rumpe 4, and Bran Selic 5 1 Technische Universität München, Germany

More information

Microsoft SharePoint Server 2013 Plan, Configure & Manage

Microsoft SharePoint Server 2013 Plan, Configure & Manage Microsoft SharePoint Server 2013 Plan, Configure & Manage Course 20331-20332B 5 Days Instructor-led, Hands on Course Information This five day instructor-led course omits the overlap and redundancy that

More information

Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 22 Slide 1

Ian Sommerville 2006 Software Engineering, 8th edition. Chapter 22 Slide 1 Verification and Validation Slide 1 Objectives To introduce software verification and validation and to discuss the distinction between them To describe the program inspection process and its role in V

More information

A Quick Guide to MaltParser Optimization

A Quick Guide to MaltParser Optimization A Quick Guide to MaltParser Optimization Joakim Nivre Johan Hall 1 Introduction MaltParser is a system for data-driven dependency parsing, which can be used to induce a parsing model from treebank data

More information

Adding Usability to Web Engineering Models and Tools

Adding Usability to Web Engineering Models and Tools Adding Usability to Web Engineering Models and Tools Richard Atterer 1 and Albrecht Schmidt 2 1 Media Informatics Group Ludwig-Maximilians-University Munich, Germany richard.atterer@ifi.lmu.de 2 Embedded

More information

Microsoft Core Solutions of Microsoft SharePoint Server 2013

Microsoft Core Solutions of Microsoft SharePoint Server 2013 1800 ULEARN (853 276) www.ddls.com.au Microsoft 20331 - Core Solutions of Microsoft SharePoint Server 2013 Length 5 days Price $4290.00 (inc GST) Version B Overview This course will provide you with the

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Educational Fusion. Implementing a Production Quality User Interface With JFC

Educational Fusion. Implementing a Production Quality User Interface With JFC Educational Fusion Implementing a Production Quality User Interface With JFC Kevin Kennedy Prof. Seth Teller 6.199 May 1999 Abstract Educational Fusion is a online algorithmic teaching program implemented

More information

Ontology-based Reasoning about Lexical Resources

Ontology-based Reasoning about Lexical Resources Ontology-based Reasoning about Lexical Resources Jan Scheffczyk, Collin F. Baker, Srini Narayanan International Computer Science Institute 1947 Center St., Suite 600, Berkeley, CA, 94704 jan,collinb,snarayan

More information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information

Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department

More information

TectoMT: Modular NLP Framework

TectoMT: Modular NLP Framework : Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation

More information

Theories of User Interface Design

Theories of User Interface Design Theories of User Interface Design High-Level Theories Foley and van Dam four-level approach GOMS Goals, Operators, Methods, and Selection Rules Conceptual level: Foley and van Dam User's mental model of

More information

Mention Detection: Heuristics for the OntoNotes annotations

Mention Detection: Heuristics for the OntoNotes annotations Mention Detection: Heuristics for the OntoNotes annotations Jonathan K. Kummerfeld, Mohit Bansal, David Burkett and Dan Klein Computer Science Division University of California at Berkeley {jkk,mbansal,dburkett,klein}@cs.berkeley.edu

More information

Chapter 4. Fundamental Concepts and Models

Chapter 4. Fundamental Concepts and Models Chapter 4. Fundamental Concepts and Models 4.1 Roles and Boundaries 4.2 Cloud Characteristics 4.3 Cloud Delivery Models 4.4 Cloud Deployment Models The upcoming sections cover introductory topic areas

More information

Aid to spatial navigation within a UIMA annotation index

Aid to spatial navigation within a UIMA annotation index Aid to spatial navigation within a UIMA annotation index Nicolas Hernandez Université de Nantes Abstract. In order to support the interoperability within UIMA workflows, we address the problem of accessing

More information

Syntax and Grammars 1 / 21

Syntax and Grammars 1 / 21 Syntax and Grammars 1 / 21 Outline What is a language? Abstract syntax and grammars Abstract syntax vs. concrete syntax Encoding grammars as Haskell data types What is a language? 2 / 21 What is a language?

More information

Ortolang Tools : MarsaTag

Ortolang Tools : MarsaTag Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements

More information

Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation

Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation Center for Reflected Text Analytics Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory Guidelines Inter-Annotator agreement Inter-subjective annotations Annotation exercise Discuss

More information

A Flexible Distributed Architecture for Natural Language Analyzers

A Flexible Distributed Architecture for Natural Language Analyzers A Flexible Distributed Architecture for Natural Language Analyzers Xavier Carreras & Lluís Padró TALP Research Center Departament de Llenguatges i Sistemes Informàtics Universitat Politècnica de Catalunya

More information

Better Evaluation for Grammatical Error Correction

Better Evaluation for Grammatical Error Correction Better Evaluation for Grammatical Error Correction Daniel Dahlmeier 1 and Hwee Tou Ng 1,2 1 NUS Graduate School for Integrative Sciences and Engineering 2 Department of Computer Science, National University

More information

Parallel Concordancing and Translation. Michael Barlow

Parallel Concordancing and Translation. Michael Barlow [Translating and the Computer 26, November 2004 [London: Aslib, 2004] Parallel Concordancing and Translation Michael Barlow Dept. of Applied Language Studies and Linguistics University of Auckland Auckland,

More information

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Best practices in the design, creation and dissemination of speech corpora at The Language Archive LREC Workshop 18 2012-05-21 Istanbul Best practices in the design, creation and dissemination of speech corpora at The Language Archive Sebastian Drude, Daan Broeder, Peter Wittenburg, Han Sloetjes The

More information