Tools for Annotating and Searching Corpora Practical Session 1: Annotating
|
|
- Betty Briggs
- 5 years ago
- Views:
Transcription
1 Tools for Annotating and Searching Corpora Practical Session 1: Annotating Stefanie Dipper Institute of Linguistics Ruhr-University Bochum Corpus Linguistics Fest (CLiF) June 6-10, 2016 Indiana University, Bloomington Stefanie Dipper Tools for annotating and searching 1 / 28
2 Today s session 1 We have a closer look at a particular POS tagset: Penn Treebank Tagset 2 We annotate some text (manually), using the tool 3 We evaluate our agreement, using different measures Stefanie Dipper Tools for annotating and searching 2 / 28
3 Outline Stefanie Dipper Tools for annotating and searching 3 / 28
4 Penn Treebank Penn Treebank Project: one of the earliest and most influential annotation projects 4.5 million words of American English Includes the Brown Corpus and the Wall Street Journal Corpus Manually annotated with POS tags and syntactic structures we only look at the POS tags today html Stefanie Dipper Tools for annotating and searching 4 / 28
5 Tag labels Labels that mark related parts of speech start with the same letter NN, NNS, NNP, NNPS: subtypes of nouns The final parts of some labels mirror inflectional endings JJ, JJR, JJS: positive, comparative, superlative adjectives VB, VBD, VBG, VBN, VBP, VBZ: base, past tense, gerund, past participle, non-3rd sg present tense, 3rd sg present tense Stefanie Dipper Tools for annotating and searching 5 / 28
6 Documentation Alphabetical list of labels with examples: comp.leeds.ac.uk/ccalas/tagsets/upenn.html the punctuation labels have been added later these are not available in Full guidelines: cgi/viewcontent.cgi? article=1603&context=cis_reports Stefanie Dipper Tools for annotating and searching 6 / 28
7 Outline Stefanie Dipper Tools for annotating and searching 7 / 28
8 A web-based annotation tool Supports different kinds of annotations spans, links/pointers we only use simple spans today: spans = tokens Supports crowd annotations annotations by multiple users followed by curation Provides agreement measures (or else: tu-darmstadt.de/webanno-testing) Stefanie Dipper Tools for annotating and searching 8 / 28
9 : Exercise Go to this page: uni-tuebingen.de/ Annotate the text in the project Penn_Anno according to the Penn Tagset Stefanie Dipper Tools for annotating and searching 9 / 28
10 Outline Stefanie Dipper Tools for annotating and searching 10 / 28
11 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Stefanie Dipper Tools for annotating and searching 11 / 28
12 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Stefanie Dipper Tools for annotating and searching 11 / 28
13 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Compare their annotations: if they annotate the same tags for most of the time, the tag meanings and guidelines are well defined, the annotators are well trained, and the resulting annotations are of good quality Stefanie Dipper Tools for annotating and searching 11 / 28
14 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Compare their annotations: if they annotate the same tags for most of the time, the tag meanings and guidelines are well defined, the annotators are well trained, and the resulting annotations are of good quality if there is rather low agreement, this could also mean that the annotation task is very difficult Stefanie Dipper Tools for annotating and searching 11 / 28
15 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Compare their annotations: if they annotate the same tags for most of the time, the tag meanings and guidelines are well defined, the annotators are well trained, and the resulting annotations are of good quality if there is rather low agreement, this could also mean that the annotation task is very difficult How do we rate agreement? measures for inter-annotator agreement (IAA) aka inter-rater / inter-coder agreement Stefanie Dipper Tools for annotating and searching 11 / 28
16 Comparing two annotators Slides adapted from Poesio and Carpenter (2010), Carpenter and Poesio (2010) Compare the annotation results of two annotators (here: annotated tags A and B ) Item Annotator 1 Annotator 2 1 A B 2 B A 3 A A 4 A B 5 B B 6 B B Stefanie Dipper Tools for annotating and searching 12 / 28
17 Observed agreement Assumption: in total 100 annotation Results displayed in a contingency table A B Total A B Total Stefanie Dipper Tools for annotating and searching 13 / 28
18 Observed agreement Assumption: in total 100 annotation Results displayed in a contingency table A B Total A B Total Agreement? =.88 Stefanie Dipper Tools for annotating and searching 13 / 28
19 Observed agreement Assumption: in total 100 annotation Results displayed in a contingency table A B Total A B Total Agreement? = Observed agreement (percent agreement) Stefanie Dipper Tools for annotating and searching 13 / 28
20 Chance agreement Some agreement is to be expected simply by chance Stefanie Dipper Tools for annotating and searching 14 / 28
21 Chance agreement Some agreement is to be expected simply by chance e.g. two annotators who annotate A or B by chance they approx. agree half of the time Stefanie Dipper Tools for annotating and searching 14 / 28
22 Chance agreement Some agreement is to be expected simply by chance e.g. two annotators who annotate A or B by chance they approx. agree half of the time The amount of chance agreement depends on the annotation scheme and the annotated data Stefanie Dipper Tools for annotating and searching 14 / 28
23 Chance agreement Some agreement is to be expected simply by chance e.g. two annotators who annotate A or B by chance they approx. agree half of the time The amount of chance agreement depends on the annotation scheme and the annotated data Sensible agreement: the amount above chance Stefanie Dipper Tools for annotating and searching 14 / 28
24 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Stefanie Dipper Tools for annotating and searching 15 / 28
25 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Agreement above chance: A o A e Maximally possible agreement above chance: 1 A e Stefanie Dipper Tools for annotating and searching 15 / 28
26 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Agreement above chance: A o A e Maximally possible agreement above chance: 1 A e Proportion of sensible agreement: A o A e 1 A e Stefanie Dipper Tools for annotating and searching 15 / 28
27 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Agreement above chance: A o A e Maximally possible agreement above chance: 1 A e Proportion of sensible agreement: A o A e 1 A e Question: how do we compute chance agreement? (A e ) there are different ways... Stefanie Dipper Tools for annotating and searching 15 / 28
28 Measure S: considers #categories S: assumption: same chance for all annotators and categories Stefanie Dipper Tools for annotating and searching 16 / 28
29 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Stefanie Dipper Tools for annotating and searching 16 / 28
30 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Probability that an annotator picks a particular category q a : 1 q Stefanie Dipper Tools for annotating and searching 16 / 28
31 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Probability that an annotator picks a particular category q a : 1 q Probability that both annotators pick a particular category q a : ( 1 q )2 Stefanie Dipper Tools for annotating and searching 16 / 28
32 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Probability that an annotator picks a particular category q a : 1 q Probability that both annotators pick a particular category q a : ( 1 q )2 Probability that both annotators pick the same category: A S e = q ( 1 q )2 = 1 q Stefanie Dipper Tools for annotating and searching 16 / 28
33 Are the categories equally likely? A B Total A B Total Stefanie Dipper Tools for annotating and searching 17 / 28
34 Are the categories equally likely? A B Total A B Total A o =.88 A e = 1 2 =.5 S = =.76 Stefanie Dipper Tools for annotating and searching 17 / 28
35 Are the categories equally likely? A B Total A B Total A B C D Total A B C D Total A o =.88 A e = 1 2 =.5 S = =.76 Stefanie Dipper Tools for annotating and searching 17 / 28
36 Are the categories equally likely? A B Total A B Total A o =.88 A e = 1 2 =.5 S = =.76 A B C D Total A B C D Total A o =.88 A e = 1 4 =.25 S = =.84 Stefanie Dipper Tools for annotating and searching 17 / 28
37 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Stefanie Dipper Tools for annotating and searching 18 / 28
38 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Stefanie Dipper Tools for annotating and searching 18 / 28
39 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Probability that an annotator picks a particular category q a : n qa N Stefanie Dipper Tools for annotating and searching 18 / 28
40 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Probability that an annotator picks a particular category q a : n qa N Probability that both annotators pick a particular category q a : ( nq a N )2 Stefanie Dipper Tools for annotating and searching 18 / 28
41 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Probability that an annotator picks a particular category q a : n qa N Probability that both annotators pick a particular category q a : ( nq a N )2 Probability that both annotators pick the same category: A π e = q ( n q N )2 = 1 N 2 q n 2 q Stefanie Dipper Tools for annotating and searching 18 / 28
42 Comparison of S and π A B C Total A B C Total Stefanie Dipper Tools for annotating and searching 19 / 28
43 Comparison of S and π A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.76 Stefanie Dipper Tools for annotating and searching 19 / 28
44 Comparison of S and π A B C Total A B C Total A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.76 Stefanie Dipper Tools for annotating and searching 19 / 28
45 Comparison of S and π A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.76 A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.647 Stefanie Dipper Tools for annotating and searching 19 / 28
46 Prevalence Imagine: Two annotators disambiguate 1000 instances of love: emotion vs. zero (as in tennis) Each annotator found 995 instances of emotion and 5 of zero, but in different cases Stefanie Dipper Tools for annotating and searching 20 / 28
47 Prevalence Imagine: Two annotators disambiguate 1000 instances of love: emotion vs. zero (as in tennis) Each annotator found 995 instances of emotion and 5 of zero, but in different cases How useful are these annotations? emotion zero Total emotion zero Total Stefanie Dipper Tools for annotating and searching 20 / 28
48 Prevalence Imagine: Two annotators disambiguate 1000 instances of love: emotion vs. zero (as in tennis) Each annotator found 995 instances of emotion and 5 of zero, but in different cases How useful are these annotations? emotion zero Total emotion zero Total A o =.99 S = =.98 π = =.005 Stefanie Dipper Tools for annotating and searching 20 / 28
49 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Stefanie Dipper Tools for annotating and searching 21 / 28
50 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Stefanie Dipper Tools for annotating and searching 21 / 28
51 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Probability that an annotator c x picks a particular category q a : nc x qa i Stefanie Dipper Tools for annotating and searching 21 / 28
52 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Probability that an annotator c x picks a particular category q a : nc x qa i Probability that both annotators pick a particular category q a : nc 1 q a i nc 2 q a i Stefanie Dipper Tools for annotating and searching 21 / 28
53 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Probability that an annotator c x picks a particular category q a : nc x qa i Probability that both annotators pick a particular category nc 2 q a i q a : nc 1 q a i Probability that both annotators pick the same category: A κ e = q n c1 q i n c 2 q i = 1 i 2 n c1 qn c2 q q Stefanie Dipper Tools for annotating and searching 21 / 28
54 Comparison of S, π and κ A B Total A B Total Stefanie Dipper Tools for annotating and searching 22 / 28
55 Comparison of S, π and κ A B Total A B Total A o : =.88 S = =.76 π = =.759 κ = =.76 Stefanie Dipper Tools for annotating and searching 22 / 28
56 Comparison of S, π and κ A B Total A B Total A B Total A B Total A o : =.88 S = =.76 π = =.759 κ = =.76 Stefanie Dipper Tools for annotating and searching 22 / 28
57 Comparison of S, π and κ A B Total A B Total A o : =.88 S = =.76 π = =.759 κ = =.76 A B Total A B Total A o : =.3 S = =.4 π = =.414 κ = =.129 Stefanie Dipper Tools for annotating and searching 22 / 28
58 Comparison of π and κ It can be proven that for any sample: π κ Stefanie Dipper Tools for annotating and searching 23 / 28
59 Comparison of π and κ It can be proven that for any sample: π κ If annotators interpret guidelines differently bad mirrored by π Stefanie Dipper Tools for annotating and searching 23 / 28
60 Comparison of π and κ It can be proven that for any sample: π κ If annotators interpret guidelines differently bad mirrored by π With many annotators, the difference between π and κ is small Stefanie Dipper Tools for annotating and searching 23 / 28
61 Interpreting agreement scores κ = 0: no agreement above chance κ = 1 perfect agreement κ <.7: often considered as bad agreement (controversial) According to Landis and Koch (1977): κ < 0 poor agreement 0 < κ <.2 slight.21 < κ <.4 fair.41 < κ <.6 moderate.61 < κ <.8 substantial κ >.81 near perfect Stefanie Dipper Tools for annotating and searching 24 / 28
62 Multiple annotators (> 2) Either: average of pairwise agreement Or: take specific measures, such as Fleiss κ Stefanie Dipper Tools for annotating and searching 25 / 28
63 Online tools Stefanie Dipper Tools for annotating and searching 26 / 28
64 References I Artstein, R. and M. Poesio (2008). Inter-coder agreement for computational linguistics (survey a rticle). Computational Linguistics 34(4), Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statis tic. Computational Linguistics 22(2), Carpenter, B. and M. Poesio (2010). Models of data annotation. malta-2010-slides.pdf. Slides from the LREC 2010 tutorial (part II). Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), Stefanie Dipper Tools for annotating and searching 27 / 28
65 References II Landis, J. R. and G. G. Koch (1977). The measurement of observer agreement for categorical data. Biometrics 33(1). Poesio, M. and B. Carpenter (2010). Statistical models of the annotation process. Part I: lrec-sli.pdf. Slides from the LREC 2010 tutorial (part I). Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19(3), Stefanie Dipper Tools for annotating and searching 28 / 28
Lecture 14: Annotation
Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose
More informationHidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney
Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder
More informationMorpho-syntactic Analysis with the Stanford CoreNLP
Morpho-syntactic Analysis with the Stanford CoreNLP Danilo Croce croce@info.uniroma2.it WmIR 2015/2016 Objectives of this tutorial Use of a Natural Language Toolkit CoreNLP toolkit Morpho-syntactic analysis
More informationDetermining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure
Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure Plaban Kr. Bhowmick, Anupam Basu, Pabitra Mitra Department of Computer Science & Engineering
More informationA Multilingual Social Media Linguistic Corpus
A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th
More informationError annotation in adjective noun (AN) combinations
Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been
More informationGetting Started with DKPro Agreement
Getting Started with DKPro Agreement Christian M. Meyer, Margot Mieskes, Christian Stab and Iryna Gurevych: DKPro Agreement: An Open-Source Java Library for Measuring Inter- Rater Agreement, in: Proceedings
More informationSchool of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)
Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationBuilding and Annotating Corpora of Collaborative Authoring in Wikipedia
Building and Annotating Corpora of Collaborative Authoring in Wikipedia Johannes Daxenberger, Oliver Ferschke and Iryna Gurevych Workshop: Building Corpora of Computer-Mediated Communication: Issues, Challenges,
More informationSURVEY PAPER ON WEB PAGE CONTENT VISUALIZATION
SURVEY PAPER ON WEB PAGE CONTENT VISUALIZATION Sushil Shrestha M.Tech IT, Department of Computer Science and Engineering, Kathmandu University Faculty, Department of Civil and Geomatics Engineering, Kathmandu
More informationNatural Language Processing
Natural Language Processing Info 159/259 Lecture 5: Truth and ethics (Sept 7, 2017) David Bamman, UC Berkeley Hwæt! Wé Gárde na in géardagum, þéodcyninga þrym gefrúnon, hú ðá æþelingas ellen fremedon.
More informationMeaning Banking and Beyond
Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis
More informationInfluence of Text Type and Text Length on Anaphoric Annotation
Influence of Text Type and Text Length on Anaphoric Annotation Daniela Goecke 1, Maik Stührenberg 1, Andreas Witt 2 Universität Bielefeld 1, Universität Tübingen 2 Fakultät für Linguistik und Literaturwissenschaft,
More informationChapter 6 Evaluation Metrics and Evaluation
Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific
More informationWebAnno: a flexible, web-based annotation tool for CLARIN
WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike
More informationEvaluating Dialogue Act Tagging with Naive and Expert Annotators
Evaluating Dialogue Act Tagging with Naive and Expert Annotators Jeroen Geertzen 1,2, Volha Petukhova 1, Harry Bunt 1 1 Dept. of Communication & Information Sciences 2 Dept. of Industrial Design Tilburg
More informationAn evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus
An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus Bertus van Rooy & Lande Schäfer Potchefstroom University Private Bag X6001, Potchefstroom 2520, South Africa Phone:
More informationManual Corpus Annotation: Giving Meaning to the Evaluation Metrics
Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics Yann MATHET,2,3 Antoine WIDLÖCHER,2,3 Karën FORT 4,5 Claire FRANÇOIS 6 Olivier GALIBERT 7 Cyril GROUIN 8,9 Juliette KAHN 7 Sophie ROSSET
More informationInter-annotator agreement
Inter-annotator agreement Ron Artstein Abstract This chapter touches upon several issues in the calculation and assessment of inter-annotator agreement. It gives an introduction to the theory behind agreement
More informationLING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong
LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro
More informationR FUNCTIONS IN SCRIPT FILE agree.coeff2.r
B.1 The R Software. - 3 - R FUNCTIONS IN SCRIPT FILE agree.coeff2.r If your analysis is limited to two raters, then you may organize your data in a contingency table that shows the count of subjects by
More informationNLP in practice, an example: Semantic Role Labeling
NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:
More informationTTIC 31190: Natural Language Processing
TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking
More informationCorpus Linguistics: corpus annotation
Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course
More informationParts of Speech, Named Entity Recognizer
Parts of Speech, Named Entity Recognizer Artificial Intelligence @ Allegheny College Janyl Jumadinova November 8, 2018 Janyl Jumadinova Parts of Speech, Named Entity Recognizer November 8, 2018 1 / 25
More informationINF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9
1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 4, 10.9 2 Working with texts From bits to meaningful units Today: 3 Reading in texts Character encodings and Unicode Word tokenization
More informationAnnota&ng corpora with WebAnno
Annota&ng corpora with WebAnno Darja Fišer Filozofska fakulteta Univerze v Ljubljani Zagreb, 4. 12. 2015 1. Lecture tool characterisbcs examples of use accessing the tool Outline 2. Demo annotabon curabon
More informationUsefulness of Nonverbal Cues from Participants in Usability Testing Sessions
Usefulness of Nonverbal Cues from Participants in Usability Testing Sessions Karen Long, Lara Styles, Terence Andre, and William Malcom Department of Behavioral Sciences and Leadership United States Air
More informationSearch Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]
Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating
More informationDynamic Feature Selection for Dependency Parsing
Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana
More informationCSC401 Natural Language Computing
CSC401 Natural Language Computing Jan 19, 2018 TA: Willie Chang Varada Kolhatkar, Ka-Chun Won, and Aryan Arbabi) Mascots: r/sandersforpresident (left) and r/the_donald (right) To perform sentiment analysis
More informationPerform the following steps to set up for this project. Start out in your login directory on csit (a.k.a. acad).
CSC 458 Data Mining and Predictive Analytics I, Fall 2017 (November 22, 2017) Dr. Dale E. Parson, Assignment 4, Comparing Weka Bayesian, clustering, ZeroR, OneR, and J48 models to predict nominal dissolved
More informationRohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur
Rohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur monojitc@microsoft.com new york times square dance scottish
More informationStack- propaga+on: Improved Representa+on Learning for Syntax
Stack- propaga+on: Improved Representa+on Learning for Syntax Yuan Zhang, David Weiss MIT, Google 1 Transi+on- based Neural Network Parser p(action configuration) So1max Hidden Embedding words labels POS
More informationIn this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.
December 13, 2006 IS256: Applied Natural Language Processing Final Project Email classification for semi-automated reply generation HANNES HESSE mail 2056 Emerson Street Berkeley, CA 94703 phone 1 (510)
More informationInformation Extraction Techniques in Terrorism Surveillance
Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism
More informationA Comparison of Automatic Categorization Algorithms
A Comparison of Automatic Email Categorization Algorithms Abstract Email is more and more important in everybody s life. Large quantity of emails makes it difficult for users to efficiently organize and
More informationTopics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016
Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review
More informationProject Proposal. Spoke: a Language for Spoken Dialog Management
Programming Languages and Translators, Fall 2010 Project Proposal Spoke: a Language for Spoken Dialog Management William Yang Wang, Xin Chen, Chia-che Tsai, Zhou Yu (yw2347, xc2180, ct2459, zy2147)@columbia.edu
More informationAnnotation and Evaluation
Annotation and Evaluation Digging into Data: Jordan Boyd-Graber University of Maryland April 15, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Annotation and Evaluation April 15, 2013 1 / 21 Exam Solutions
More informationVoting between Multiple Data Representations for Text Chunking
Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.
More informationStatistical parsing. Fei Xia Feb 27, 2009 CSE 590A
Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised
More informationBackground and Context for CLASP. Nancy Ide, Vassar College
Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding
More informationCorpus Linguistics. Seminar Resources for Computational Linguists SS Magdalena Wolska & Michaela Regneri
Seminar Resources for Computational Linguists SS 2007 Magdalena Wolska & Michaela Regneri Armchair Linguists vs. Corpus Linguists Competence Performance 2 Motivation (for ) 3 Outline Corpora Annotation
More informationDo we agree on user interface aesthetics of Android apps?
Do we agree on user interface aesthetics of Android apps? Christiane G. von Wangenheim*ª, João V. Araujo Portoª, Jean C.R. Hauckª, Adriano F. Borgattoª ªDepartment of Informatics and Statistics Federal
More informationApache UIMA and Mayo ctakes
Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured
More informationChapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc.
Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. 1 Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. 2 Learning
More informationInformation Retrieval
Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment
More informationProbabilistic parsing with a wide variety of features
Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368
More informationPackage phrasemachine
Type Package Title Simple Phrase Extraction Version 1.1.2 Date 2017-05-29 Package phrasemachine May 29, 2017 Author Matthew J. Denny, Abram Handler, Brendan O'Connor Maintainer Matthew J. Denny
More informationCenter for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation
Center for Reflected Text Analytics Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory Guidelines Inter-Annotator agreement Inter-subjective annotations Annotation exercise Discuss
More informationNarrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora
Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora Yongzheng Zhang, Nur Zincir-Heywood, and Evangelos Milios Faculty of Computer Science, Dalhousie University 6050
More informationRetrieval Evaluation. Hongning Wang
Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User
More informationNLTK Server Documentation
NLTK Server Documentation Release 1 Preetham MS January 31, 2017 Contents 1 Documentation 3 1.1 Installation................................................ 3 1.2 API Documentation...........................................
More informationCS 224N Assignment 2 Writeup
CS 224N Assignment 2 Writeup Angela Gong agong@stanford.edu Dept. of Computer Science Allen Nie anie@stanford.edu Symbolic Systems Program 1 Introduction 1.1 PCFG A probabilistic context-free grammar (PCFG)
More informationA bit of theory: Algorithms
A bit of theory: Algorithms There are different kinds of algorithms Vector space models. e.g. support vector machines Decision trees, e.g. C45 Probabilistic models, e.g. Naive Bayes Neural networks, e.g.
More informationProjektgruppe. Michael Meier. Named-Entity-Recognition Pipeline
Projektgruppe Michael Meier Named-Entity-Recognition Pipeline What is Named-Entitiy-Recognition? Named-Entity Nameable objects in the world, e.g.: Person: Albert Einstein Organization: Deutsche Bank Location:
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationIntroduction to Text Mining. Aris Xanthos - University of Lausanne
Introduction to Text Mining Aris Xanthos - University of Lausanne Preliminary notes Presentation designed for a novice audience Text mining = text analysis = text analytics: using computational and quantitative
More informationA Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition
A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es
More informationIntroducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS
Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from
More informationSAPIENT Automation project
Dr Maria Liakata Leverhulme Trust Early Career fellow Department of Computer Science, Aberystwyth University Visitor at EBI, Cambridge mal@aber.ac.uk 25 May 2010, London Motivation SAPIENT Automation Project
More informationUnsupervised Keyword Extraction from Single Document. Swagata Duari Aditya Gupta Vasudha Bhatnagar
Unsupervised Keyword Extraction from Single Document Swagata Duari Aditya Gupta Vasudha Bhatnagar Presentation Outline Introduction and Motivation Statistical Methods for Automatic Keyword Extraction Graph-based
More informationBD003: Introduction to NLP Part 2 Information Extraction
BD003: Introduction to NLP Part 2 Information Extraction The University of Sheffield, 1995-2017 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. Contents This
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationContents. List of Figures. List of Tables. Acknowledgements
Contents List of Figures List of Tables Acknowledgements xiii xv xvii 1 Introduction 1 1.1 Linguistic Data Analysis 3 1.1.1 What's data? 3 1.1.2 Forms of data 3 1.1.3 Collecting and analysing data 7 1.2
More informationJubilee: Propbank Instance Editor Guideline (Version 2.1)
Jubilee: Propbank Instance Editor Guideline (Version 2.1) Jinho D. Choi choijd@colorado.edu Claire Bonial bonial@colorado.edu Martha Palmer mpalmer@colorado.edu Center for Computational Language and EducAtion
More informationService Control EasyApp Measuring Quality of Experience
Service Control EasyApp Measuring Quality of Experience Abstract This Cisco Service Control Engine (SCE) EasyApp memo explains the concept of quality of experience (QoE), an approach to measure network
More informationCorpus Linguistics for NLP APLN550. Adam Meyers Montclair State University 9/22/2014 and 9/29/2014
Corpus Linguistics for NLP APLN550 Adam Meyers Montclair State University 9/22/ and 9/29/ Text Corpora in NLP Corpus Selection Corpus Annotation: Purpose Representation Issues Linguistic Methods Measuring
More informationIBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio.
IBM Watson Application Developer Workshop Lab02 Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio January 2017 Duration: 60 minutes Prepared by Víctor L. Fandiño
More informationStandardization in Assessment and Reporting of Intercoder Reliability in Content Analyses
Standardization in Assessment and Reporting of Intercoder Reliability in Content Analyses Matthew Lombard, Temple University December 5, 2008 University of Michigan Overview History of interest in topic
More informationIntroduction to Information Extraction (IE) and ANNIE
Module 1 Session 2 Introduction to Information Extraction (IE) and ANNIE The University of Sheffield, 1995-2015 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence.
More informationThe Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T
The Muc7 T Corpus Katrin Tomanek and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany {katrin.tomanek udo.hahn}@uni-jena.de 1 Introduction
More informationWEB HARVESTING AND SENTIMENT ANALYSIS OF CONSUMER FEEDBACK
WEB HARVESTING AND SENTIMENT ANALYSIS OF CONSUMER FEEDBACK Emil Şt. CHIFU, Tiberiu Şt. LEŢIA, Bogdan BUDIŞAN, Viorica R. CHIFU Faculty of Automation and Computer Science, Technical University of Cluj-Napoca
More informationCS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University
CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and
More informationConditional Random Fields. Mike Brodie CS 778
Conditional Random Fields Mike Brodie CS 778 Motivation Part-Of-Speech Tagger 2 Motivation object 3 Motivation I object! 4 Motivation object Do you see that object? 5 Motivation Part-Of-Speech Tagger -
More informationNatural Language Processing Tutorial May 26 & 27, 2011
Cognitive Computation Group Natural Language Processing Tutorial May 26 & 27, 2011 http://cogcomp.cs.illinois.edu So why aren t words enough? Depends on the application more advanced task may require more
More informationQuery Difficulty Prediction for Contextual Image Retrieval
Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.
More informationMaca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology
Maca a configurable tool to integrate Polish morphological data Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Outline Morphological resources for Polish Tagset and segmentation differences
More informationStructured Prediction Basics
CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad
More informationTectoMT: Modular NLP Framework
: Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation
More informationANC2Go: A Web Application for Customized Corpus Creation
ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu
More informationSemantic Pattern Classification
PFL054 Term project 2011/2012 Semantic Pattern Classification Ema Krejčová 1 Introduction The aim of the project is to construct classifiers which should assign semantic patterns to six given verbs, as
More informationBuilding Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018
Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018 About me alessandro.raganato@helsinki.fi http://wwwusers.di.uniroma1.it/~raganato ERC
More informationFunctional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning
Functional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning Rebecca J. Passonneau 1, Tae Yano 2, Tom Lippincott 3, and Judith Klavans 4 1 Center for Computational
More informationInformation Retrieval. Lecture 7
Information Retrieval Lecture 7 Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations This lecture Evaluating a search engine Benchmarks Precision
More informationA Methodology for Evaluating Aggregated Search Results
A Methodology for Evaluating Aggregated Search Results Jaime Arguello 1, Fernando Diaz 2, Jamie Callan 1, and Ben Carterette 3 1 Carnegie Mellon University 2 Yahoo! Research 3 University of Delaware Abstract.
More informationQuestion Answering Systems
Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction
More informationKH Coder 3 Reference Manual
KH Coder 3 Reference Manual Koichi HIGUCHI * 1 March 16, 2016 *1 Ritsumeikan University i Contents A KH Coder Reference Manual 1 A.1 Setup...............................................
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationLING/C SC 581: Advanced Computational Linguistics. Lecture Notes Jan 23 rd
LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 23 rd Today's Topics Homework 2 review Homework 2 review Write a Python program to print out the number of syllables in a word (in CMUdict).
More informationPackage corenlp. June 3, 2015
Type Package Title Wrappers Around Stanford CoreNLP Tools Version 0.4-1 Author Taylor Arnold, Lauren Tilton Package corenlp June 3, 2015 Maintainer Taylor Arnold Provides a minimal
More informationEvaluation. David Kauchak cs160 Fall 2009 adapted from:
Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For
More informationLexical Semantics. Regina Barzilay MIT. October, 5766
Lexical Semantics Regina Barzilay MIT October, 5766 Last Time: Vector-Based Similarity Measures man woman grape orange apple n Euclidian: x, y = x y = i=1 ( x i y i ) 2 n x y x i y i i=1 Cosine: cos( x,
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationIntroduction to IE and ANNIE
Introduction to IE and ANNIE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. About this tutorial This tutorial comprises
More informationXML Support for Annotated Language Resources
XML Support for Annotated Language Resources Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York USA ide@cs.vassar.edu Laurent Romary Equipe Langue et Dialogue LORIA/CNRS Vandoeuvre-lès-Nancy,
More informationFeature Based Sentimental Analysis on Mobile Web Domain
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 08, Issue 6 (June. 2018), V (VII) PP 01-07 www.iosrjen.org Shoieb Ahamed 1 Alit Danti 2 1. Government First Grade College,
More informationHomework 2: Parsing and Machine Learning
Homework 2: Parsing and Machine Learning COMS W4705_001: Natural Language Processing Prof. Kathleen McKeown, Fall 2017 Due: Saturday, October 14th, 2017, 2:00 PM This assignment will consist of tasks in
More information