Tools for Annotating and Searching Corpora Practical Session 1: Annotating

Size: px
Start display at page:

Download "Tools for Annotating and Searching Corpora Practical Session 1: Annotating"

Transcription

1 Tools for Annotating and Searching Corpora Practical Session 1: Annotating Stefanie Dipper Institute of Linguistics Ruhr-University Bochum Corpus Linguistics Fest (CLiF) June 6-10, 2016 Indiana University, Bloomington Stefanie Dipper Tools for annotating and searching 1 / 28

2 Today s session 1 We have a closer look at a particular POS tagset: Penn Treebank Tagset 2 We annotate some text (manually), using the tool 3 We evaluate our agreement, using different measures Stefanie Dipper Tools for annotating and searching 2 / 28

3 Outline Stefanie Dipper Tools for annotating and searching 3 / 28

4 Penn Treebank Penn Treebank Project: one of the earliest and most influential annotation projects 4.5 million words of American English Includes the Brown Corpus and the Wall Street Journal Corpus Manually annotated with POS tags and syntactic structures we only look at the POS tags today html Stefanie Dipper Tools for annotating and searching 4 / 28

5 Tag labels Labels that mark related parts of speech start with the same letter NN, NNS, NNP, NNPS: subtypes of nouns The final parts of some labels mirror inflectional endings JJ, JJR, JJS: positive, comparative, superlative adjectives VB, VBD, VBG, VBN, VBP, VBZ: base, past tense, gerund, past participle, non-3rd sg present tense, 3rd sg present tense Stefanie Dipper Tools for annotating and searching 5 / 28

6 Documentation Alphabetical list of labels with examples: comp.leeds.ac.uk/ccalas/tagsets/upenn.html the punctuation labels have been added later these are not available in Full guidelines: cgi/viewcontent.cgi? article=1603&context=cis_reports Stefanie Dipper Tools for annotating and searching 6 / 28

7 Outline Stefanie Dipper Tools for annotating and searching 7 / 28

8 A web-based annotation tool Supports different kinds of annotations spans, links/pointers we only use simple spans today: spans = tokens Supports crowd annotations annotations by multiple users followed by curation Provides agreement measures (or else: tu-darmstadt.de/webanno-testing) Stefanie Dipper Tools for annotating and searching 8 / 28

9 : Exercise Go to this page: uni-tuebingen.de/ Annotate the text in the project Penn_Anno according to the Penn Tagset Stefanie Dipper Tools for annotating and searching 9 / 28

10 Outline Stefanie Dipper Tools for annotating and searching 10 / 28

11 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Stefanie Dipper Tools for annotating and searching 11 / 28

12 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Stefanie Dipper Tools for annotating and searching 11 / 28

13 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Compare their annotations: if they annotate the same tags for most of the time, the tag meanings and guidelines are well defined, the annotators are well trained, and the resulting annotations are of good quality Stefanie Dipper Tools for annotating and searching 11 / 28

14 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Compare their annotations: if they annotate the same tags for most of the time, the tag meanings and guidelines are well defined, the annotators are well trained, and the resulting annotations are of good quality if there is rather low agreement, this could also mean that the annotation task is very difficult Stefanie Dipper Tools for annotating and searching 11 / 28

15 Manual annotations For an overview: Artstein and Poesio (2008) Manual annotations are provided either by experts (e.g. linguists) or by crowd sourcing If we want to use the annotations, they should be correct Question: how do we know how good our annotations are? Take a lot of annotators and let them independently annotate the same text Compare their annotations: if they annotate the same tags for most of the time, the tag meanings and guidelines are well defined, the annotators are well trained, and the resulting annotations are of good quality if there is rather low agreement, this could also mean that the annotation task is very difficult How do we rate agreement? measures for inter-annotator agreement (IAA) aka inter-rater / inter-coder agreement Stefanie Dipper Tools for annotating and searching 11 / 28

16 Comparing two annotators Slides adapted from Poesio and Carpenter (2010), Carpenter and Poesio (2010) Compare the annotation results of two annotators (here: annotated tags A and B ) Item Annotator 1 Annotator 2 1 A B 2 B A 3 A A 4 A B 5 B B 6 B B Stefanie Dipper Tools for annotating and searching 12 / 28

17 Observed agreement Assumption: in total 100 annotation Results displayed in a contingency table A B Total A B Total Stefanie Dipper Tools for annotating and searching 13 / 28

18 Observed agreement Assumption: in total 100 annotation Results displayed in a contingency table A B Total A B Total Agreement? =.88 Stefanie Dipper Tools for annotating and searching 13 / 28

19 Observed agreement Assumption: in total 100 annotation Results displayed in a contingency table A B Total A B Total Agreement? = Observed agreement (percent agreement) Stefanie Dipper Tools for annotating and searching 13 / 28

20 Chance agreement Some agreement is to be expected simply by chance Stefanie Dipper Tools for annotating and searching 14 / 28

21 Chance agreement Some agreement is to be expected simply by chance e.g. two annotators who annotate A or B by chance they approx. agree half of the time Stefanie Dipper Tools for annotating and searching 14 / 28

22 Chance agreement Some agreement is to be expected simply by chance e.g. two annotators who annotate A or B by chance they approx. agree half of the time The amount of chance agreement depends on the annotation scheme and the annotated data Stefanie Dipper Tools for annotating and searching 14 / 28

23 Chance agreement Some agreement is to be expected simply by chance e.g. two annotators who annotate A or B by chance they approx. agree half of the time The amount of chance agreement depends on the annotation scheme and the annotated data Sensible agreement: the amount above chance Stefanie Dipper Tools for annotating and searching 14 / 28

24 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Stefanie Dipper Tools for annotating and searching 15 / 28

25 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Agreement above chance: A o A e Maximally possible agreement above chance: 1 A e Stefanie Dipper Tools for annotating and searching 15 / 28

26 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Agreement above chance: A o A e Maximally possible agreement above chance: 1 A e Proportion of sensible agreement: A o A e 1 A e Stefanie Dipper Tools for annotating and searching 15 / 28

27 Expected agreement Observed agreement (A o ): amount of actual agreement Expected agreement (A e ): expected value of A o Agreement above chance: A o A e Maximally possible agreement above chance: 1 A e Proportion of sensible agreement: A o A e 1 A e Question: how do we compute chance agreement? (A e ) there are different ways... Stefanie Dipper Tools for annotating and searching 15 / 28

28 Measure S: considers #categories S: assumption: same chance for all annotators and categories Stefanie Dipper Tools for annotating and searching 16 / 28

29 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Stefanie Dipper Tools for annotating and searching 16 / 28

30 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Probability that an annotator picks a particular category q a : 1 q Stefanie Dipper Tools for annotating and searching 16 / 28

31 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Probability that an annotator picks a particular category q a : 1 q Probability that both annotators pick a particular category q a : ( 1 q )2 Stefanie Dipper Tools for annotating and searching 16 / 28

32 Measure S: considers #categories S: assumption: same chance for all annotators and categories Number of category labels: q Probability that an annotator picks a particular category q a : 1 q Probability that both annotators pick a particular category q a : ( 1 q )2 Probability that both annotators pick the same category: A S e = q ( 1 q )2 = 1 q Stefanie Dipper Tools for annotating and searching 16 / 28

33 Are the categories equally likely? A B Total A B Total Stefanie Dipper Tools for annotating and searching 17 / 28

34 Are the categories equally likely? A B Total A B Total A o =.88 A e = 1 2 =.5 S = =.76 Stefanie Dipper Tools for annotating and searching 17 / 28

35 Are the categories equally likely? A B Total A B Total A B C D Total A B C D Total A o =.88 A e = 1 2 =.5 S = =.76 Stefanie Dipper Tools for annotating and searching 17 / 28

36 Are the categories equally likely? A B Total A B Total A o =.88 A e = 1 2 =.5 S = =.76 A B C D Total A B C D Total A o =.88 A e = 1 4 =.25 S = =.84 Stefanie Dipper Tools for annotating and searching 17 / 28

37 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Stefanie Dipper Tools for annotating and searching 18 / 28

38 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Stefanie Dipper Tools for annotating and searching 18 / 28

39 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Probability that an annotator picks a particular category q a : n qa N Stefanie Dipper Tools for annotating and searching 18 / 28

40 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Probability that an annotator picks a particular category q a : n qa N Probability that both annotators pick a particular category q a : ( nq a N )2 Stefanie Dipper Tools for annotating and searching 18 / 28

41 π: different chance for different categories Scott s π: assumption: different chance for categories (Scott 1955) Number of annotations: N Number of annotations with category q a : n qa Probability that an annotator picks a particular category q a : n qa N Probability that both annotators pick a particular category q a : ( nq a N )2 Probability that both annotators pick the same category: A π e = q ( n q N )2 = 1 N 2 q n 2 q Stefanie Dipper Tools for annotating and searching 18 / 28

42 Comparison of S and π A B C Total A B C Total Stefanie Dipper Tools for annotating and searching 19 / 28

43 Comparison of S and π A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.76 Stefanie Dipper Tools for annotating and searching 19 / 28

44 Comparison of S and π A B C Total A B C Total A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.76 Stefanie Dipper Tools for annotating and searching 19 / 28

45 Comparison of S and π A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.76 A B C Total A B C Total A o =.88 S =.88 1/3 1 1/3 =.82 π = =.647 Stefanie Dipper Tools for annotating and searching 19 / 28

46 Prevalence Imagine: Two annotators disambiguate 1000 instances of love: emotion vs. zero (as in tennis) Each annotator found 995 instances of emotion and 5 of zero, but in different cases Stefanie Dipper Tools for annotating and searching 20 / 28

47 Prevalence Imagine: Two annotators disambiguate 1000 instances of love: emotion vs. zero (as in tennis) Each annotator found 995 instances of emotion and 5 of zero, but in different cases How useful are these annotations? emotion zero Total emotion zero Total Stefanie Dipper Tools for annotating and searching 20 / 28

48 Prevalence Imagine: Two annotators disambiguate 1000 instances of love: emotion vs. zero (as in tennis) Each annotator found 995 instances of emotion and 5 of zero, but in different cases How useful are these annotations? emotion zero Total emotion zero Total A o =.99 S = =.98 π = =.005 Stefanie Dipper Tools for annotating and searching 20 / 28

49 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Stefanie Dipper Tools for annotating and searching 21 / 28

50 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Stefanie Dipper Tools for annotating and searching 21 / 28

51 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Probability that an annotator c x picks a particular category q a : nc x qa i Stefanie Dipper Tools for annotating and searching 21 / 28

52 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Probability that an annotator c x picks a particular category q a : nc x qa i Probability that both annotators pick a particular category q a : nc 1 q a i nc 2 q a i Stefanie Dipper Tools for annotating and searching 21 / 28

53 Kappa: considers individual bias Cohen s κ: assumption: different annotators have different interpreations of the guidelines (bias/prejudice) (Cohen 1960; Carletta 1996) Total number of items/markables: i Probability that an annotator c x picks a particular category q a : nc x qa i Probability that both annotators pick a particular category nc 2 q a i q a : nc 1 q a i Probability that both annotators pick the same category: A κ e = q n c1 q i n c 2 q i = 1 i 2 n c1 qn c2 q q Stefanie Dipper Tools for annotating and searching 21 / 28

54 Comparison of S, π and κ A B Total A B Total Stefanie Dipper Tools for annotating and searching 22 / 28

55 Comparison of S, π and κ A B Total A B Total A o : =.88 S = =.76 π = =.759 κ = =.76 Stefanie Dipper Tools for annotating and searching 22 / 28

56 Comparison of S, π and κ A B Total A B Total A B Total A B Total A o : =.88 S = =.76 π = =.759 κ = =.76 Stefanie Dipper Tools for annotating and searching 22 / 28

57 Comparison of S, π and κ A B Total A B Total A o : =.88 S = =.76 π = =.759 κ = =.76 A B Total A B Total A o : =.3 S = =.4 π = =.414 κ = =.129 Stefanie Dipper Tools for annotating and searching 22 / 28

58 Comparison of π and κ It can be proven that for any sample: π κ Stefanie Dipper Tools for annotating and searching 23 / 28

59 Comparison of π and κ It can be proven that for any sample: π κ If annotators interpret guidelines differently bad mirrored by π Stefanie Dipper Tools for annotating and searching 23 / 28

60 Comparison of π and κ It can be proven that for any sample: π κ If annotators interpret guidelines differently bad mirrored by π With many annotators, the difference between π and κ is small Stefanie Dipper Tools for annotating and searching 23 / 28

61 Interpreting agreement scores κ = 0: no agreement above chance κ = 1 perfect agreement κ <.7: often considered as bad agreement (controversial) According to Landis and Koch (1977): κ < 0 poor agreement 0 < κ <.2 slight.21 < κ <.4 fair.41 < κ <.6 moderate.61 < κ <.8 substantial κ >.81 near perfect Stefanie Dipper Tools for annotating and searching 24 / 28

62 Multiple annotators (> 2) Either: average of pairwise agreement Or: take specific measures, such as Fleiss κ Stefanie Dipper Tools for annotating and searching 25 / 28

63 Online tools Stefanie Dipper Tools for annotating and searching 26 / 28

64 References I Artstein, R. and M. Poesio (2008). Inter-coder agreement for computational linguistics (survey a rticle). Computational Linguistics 34(4), Carletta, J. (1996). Assessing agreement on classification tasks: the kappa statis tic. Computational Linguistics 22(2), Carpenter, B. and M. Poesio (2010). Models of data annotation. malta-2010-slides.pdf. Slides from the LREC 2010 tutorial (part II). Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20(1), Stefanie Dipper Tools for annotating and searching 27 / 28

65 References II Landis, J. R. and G. G. Koch (1977). The measurement of observer agreement for categorical data. Biometrics 33(1). Poesio, M. and B. Carpenter (2010). Statistical models of the annotation process. Part I: lrec-sli.pdf. Slides from the LREC 2010 tutorial (part I). Scott, W. A. (1955). Reliability of content analysis: The case of nominal scale coding. Public Opinion Quarterly 19(3), Stefanie Dipper Tools for annotating and searching 28 / 28

Lecture 14: Annotation

Lecture 14: Annotation Lecture 14: Annotation Nathan Schneider (with material from Henry Thompson, Alex Lascarides) ENLP 23 October 2016 1/14 Annotation Why gold 6= perfect Quality Control 2/14 Factors in Annotation Suppose

More information

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney

Hidden Markov Models. Natural Language Processing: Jordan Boyd-Graber. University of Colorado Boulder LECTURE 20. Adapted from material by Ray Mooney Hidden Markov Models Natural Language Processing: Jordan Boyd-Graber University of Colorado Boulder LECTURE 20 Adapted from material by Ray Mooney Natural Language Processing: Jordan Boyd-Graber Boulder

More information

Morpho-syntactic Analysis with the Stanford CoreNLP

Morpho-syntactic Analysis with the Stanford CoreNLP Morpho-syntactic Analysis with the Stanford CoreNLP Danilo Croce croce@info.uniroma2.it WmIR 2015/2016 Objectives of this tutorial Use of a Natural Language Toolkit CoreNLP toolkit Morpho-syntactic analysis

More information

Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure

Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure Determining Reliability of Subjective and Multi-label Emotion Annotation through Novel Fuzzy Agreement Measure Plaban Kr. Bhowmick, Anupam Basu, Pabitra Mitra Department of Computer Science & Engineering

More information

A Multilingual Social Media Linguistic Corpus

A Multilingual Social Media Linguistic Corpus A Multilingual Social Media Linguistic Corpus Luis Rei 1,2 Dunja Mladenić 1,2 Simon Krek 1 1 Artificial Intelligence Laboratory Jožef Stefan Institute 2 Jožef Stefan International Postgraduate School 4th

More information

Error annotation in adjective noun (AN) combinations

Error annotation in adjective noun (AN) combinations Error annotation in adjective noun (AN) combinations This document describes the annotation scheme devised for annotating errors in AN combinations and explains how the inter-annotator agreement has been

More information

Getting Started with DKPro Agreement

Getting Started with DKPro Agreement Getting Started with DKPro Agreement Christian M. Meyer, Margot Mieskes, Christian Stab and Iryna Gurevych: DKPro Agreement: An Open-Source Java Library for Measuring Inter- Rater Agreement, in: Proceedings

More information

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017)

School of Computing and Information Systems The University of Melbourne COMP90042 WEB SEARCH AND TEXT ANALYSIS (Semester 1, 2017) Discussion School of Computing and Information Systems The University of Melbourne COMP9004 WEB SEARCH AND TEXT ANALYSIS (Semester, 07). What is a POS tag? Sample solutions for discussion exercises: Week

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Building and Annotating Corpora of Collaborative Authoring in Wikipedia Building and Annotating Corpora of Collaborative Authoring in Wikipedia Johannes Daxenberger, Oliver Ferschke and Iryna Gurevych Workshop: Building Corpora of Computer-Mediated Communication: Issues, Challenges,

More information

SURVEY PAPER ON WEB PAGE CONTENT VISUALIZATION

SURVEY PAPER ON WEB PAGE CONTENT VISUALIZATION SURVEY PAPER ON WEB PAGE CONTENT VISUALIZATION Sushil Shrestha M.Tech IT, Department of Computer Science and Engineering, Kathmandu University Faculty, Department of Civil and Geomatics Engineering, Kathmandu

More information

Natural Language Processing

Natural Language Processing Natural Language Processing Info 159/259 Lecture 5: Truth and ethics (Sept 7, 2017) David Bamman, UC Berkeley Hwæt! Wé Gárde na in géardagum, þéodcyninga þrym gefrúnon, hú ðá æþelingas ellen fremedon.

More information

Meaning Banking and Beyond

Meaning Banking and Beyond Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis

More information

Influence of Text Type and Text Length on Anaphoric Annotation

Influence of Text Type and Text Length on Anaphoric Annotation Influence of Text Type and Text Length on Anaphoric Annotation Daniela Goecke 1, Maik Stührenberg 1, Andreas Witt 2 Universität Bielefeld 1, Universität Tübingen 2 Fakultät für Linguistik und Literaturwissenschaft,

More information

Chapter 6 Evaluation Metrics and Evaluation

Chapter 6 Evaluation Metrics and Evaluation Chapter 6 Evaluation Metrics and Evaluation The area of evaluation of information retrieval and natural language processing systems is complex. It will only be touched on in this chapter. First the scientific

More information

WebAnno: a flexible, web-based annotation tool for CLARIN

WebAnno: a flexible, web-based annotation tool for CLARIN WebAnno: a flexible, web-based annotation tool for CLARIN Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, Seid Muhie Yimam #WebAnno This work is licensed under a Attribution-NonCommercial-ShareAlike

More information

Evaluating Dialogue Act Tagging with Naive and Expert Annotators

Evaluating Dialogue Act Tagging with Naive and Expert Annotators Evaluating Dialogue Act Tagging with Naive and Expert Annotators Jeroen Geertzen 1,2, Volha Petukhova 1, Harry Bunt 1 1 Dept. of Communication & Information Sciences 2 Dept. of Industrial Design Tilburg

More information

An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus

An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus An evaluation of three POS taggers for the tagging of the Tswana Learner English Corpus Bertus van Rooy & Lande Schäfer Potchefstroom University Private Bag X6001, Potchefstroom 2520, South Africa Phone:

More information

Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics

Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics Manual Corpus Annotation: Giving Meaning to the Evaluation Metrics Yann MATHET,2,3 Antoine WIDLÖCHER,2,3 Karën FORT 4,5 Claire FRANÇOIS 6 Olivier GALIBERT 7 Cyril GROUIN 8,9 Juliette KAHN 7 Sophie ROSSET

More information

Inter-annotator agreement

Inter-annotator agreement Inter-annotator agreement Ron Artstein Abstract This chapter touches upon several issues in the calculation and assessment of inter-annotator agreement. It gives an introduction to the theory behind agreement

More information

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong

LING/C SC/PSYC 438/538. Lecture 3 Sandiway Fong LING/C SC/PSYC 438/538 Lecture 3 Sandiway Fong Today s Topics Homework 4 out due next Tuesday by midnight Homework 3 should have been submitted yesterday Quick Homework 3 review Continue with Perl intro

More information

R FUNCTIONS IN SCRIPT FILE agree.coeff2.r

R FUNCTIONS IN SCRIPT FILE agree.coeff2.r B.1 The R Software. - 3 - R FUNCTIONS IN SCRIPT FILE agree.coeff2.r If your analysis is limited to two raters, then you may organize your data in a contingency table that shows the count of subjects by

More information

NLP in practice, an example: Semantic Role Labeling

NLP in practice, an example: Semantic Role Labeling NLP in practice, an example: Semantic Role Labeling Anders Björkelund Lund University, Dept. of Computer Science anders.bjorkelund@cs.lth.se October 15, 2010 Anders Björkelund NLP in practice, an example:

More information

TTIC 31190: Natural Language Processing

TTIC 31190: Natural Language Processing TTIC 31190: Natural Language Processing Kevin Gimpel Winter 2016 Lecture 2: Text Classification 1 Please email me (kgimpel@ttic.edu) with the following: your name your email address whether you taking

More information

Corpus Linguistics: corpus annotation

Corpus Linguistics: corpus annotation Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course

More information

Parts of Speech, Named Entity Recognizer

Parts of Speech, Named Entity Recognizer Parts of Speech, Named Entity Recognizer Artificial Intelligence @ Allegheny College Janyl Jumadinova November 8, 2018 Janyl Jumadinova Parts of Speech, Named Entity Recognizer November 8, 2018 1 / 25

More information

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9

INF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9 1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 4, 10.9 2 Working with texts From bits to meaningful units Today: 3 Reading in texts Character encodings and Unicode Word tokenization

More information

Annota&ng corpora with WebAnno

Annota&ng corpora with WebAnno Annota&ng corpora with WebAnno Darja Fišer Filozofska fakulteta Univerze v Ljubljani Zagreb, 4. 12. 2015 1. Lecture tool characterisbcs examples of use accessing the tool Outline 2. Demo annotabon curabon

More information

Usefulness of Nonverbal Cues from Participants in Usability Testing Sessions

Usefulness of Nonverbal Cues from Participants in Usability Testing Sessions Usefulness of Nonverbal Cues from Participants in Usability Testing Sessions Karen Long, Lara Styles, Terence Andre, and William Malcom Department of Behavioral Sciences and Leadership United States Air

More information

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS]

Search Evaluation. Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Search Evaluation Tao Yang CS293S Slides partially based on text book [CMS] [MRS] Table of Content Search Engine Evaluation Metrics for relevancy Precision/recall F-measure MAP NDCG Difficulties in Evaluating

More information

Dynamic Feature Selection for Dependency Parsing

Dynamic Feature Selection for Dependency Parsing Dynamic Feature Selection for Dependency Parsing He He, Hal Daumé III and Jason Eisner EMNLP 2013, Seattle Structured Prediction in NLP Part-of-Speech Tagging Parsing N N V Det N Fruit flies like a banana

More information

CSC401 Natural Language Computing

CSC401 Natural Language Computing CSC401 Natural Language Computing Jan 19, 2018 TA: Willie Chang Varada Kolhatkar, Ka-Chun Won, and Aryan Arbabi) Mascots: r/sandersforpresident (left) and r/the_donald (right) To perform sentiment analysis

More information

Perform the following steps to set up for this project. Start out in your login directory on csit (a.k.a. acad).

Perform the following steps to set up for this project. Start out in your login directory on csit (a.k.a. acad). CSC 458 Data Mining and Predictive Analytics I, Fall 2017 (November 22, 2017) Dr. Dale E. Parson, Assignment 4, Comparing Weka Bayesian, clustering, ZeroR, OneR, and J48 models to predict nominal dissolved

More information

Rohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur

Rohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur Rohan Ramanath, R. V. College of Engineering, Bangalore Monojit Choudhury, Kalika Bali, Microsoft Research India Rishiraj Saha Roy, IIT Kharagpur monojitc@microsoft.com new york times square dance scottish

More information

Stack- propaga+on: Improved Representa+on Learning for Syntax

Stack- propaga+on: Improved Representa+on Learning for Syntax Stack- propaga+on: Improved Representa+on Learning for Syntax Yuan Zhang, David Weiss MIT, Google 1 Transi+on- based Neural Network Parser p(action configuration) So1max Hidden Embedding words labels POS

More information

In this project, I examined methods to classify a corpus of s by their content in order to suggest text blocks for semi-automatic replies.

In this project, I examined methods to classify a corpus of  s by their content in order to suggest text blocks for semi-automatic replies. December 13, 2006 IS256: Applied Natural Language Processing Final Project Email classification for semi-automated reply generation HANNES HESSE mail 2056 Emerson Street Berkeley, CA 94703 phone 1 (510)

More information

Information Extraction Techniques in Terrorism Surveillance

Information Extraction Techniques in Terrorism Surveillance Information Extraction Techniques in Terrorism Surveillance Roman Tekhov Abstract. The article gives a brief overview of what information extraction is and how it might be used for the purposes of counter-terrorism

More information

A Comparison of Automatic Categorization Algorithms

A Comparison of Automatic  Categorization Algorithms A Comparison of Automatic Email Categorization Algorithms Abstract Email is more and more important in everybody s life. Large quantity of emails makes it difficult for users to efficiently organize and

More information

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016

Topics in Parsing: Context and Markovization; Dependency Parsing. COMP-599 Oct 17, 2016 Topics in Parsing: Context and Markovization; Dependency Parsing COMP-599 Oct 17, 2016 Outline Review Incorporating context Markovization Learning the context Dependency parsing Eisner s algorithm 2 Review

More information

Project Proposal. Spoke: a Language for Spoken Dialog Management

Project Proposal. Spoke: a Language for Spoken Dialog Management Programming Languages and Translators, Fall 2010 Project Proposal Spoke: a Language for Spoken Dialog Management William Yang Wang, Xin Chen, Chia-che Tsai, Zhou Yu (yw2347, xc2180, ct2459, zy2147)@columbia.edu

More information

Annotation and Evaluation

Annotation and Evaluation Annotation and Evaluation Digging into Data: Jordan Boyd-Graber University of Maryland April 15, 2013 Digging into Data: Jordan Boyd-Graber (UMD) Annotation and Evaluation April 15, 2013 1 / 21 Exam Solutions

More information

Voting between Multiple Data Representations for Text Chunking

Voting between Multiple Data Representations for Text Chunking Voting between Multiple Data Representations for Text Chunking Hong Shen and Anoop Sarkar School of Computing Science Simon Fraser University Burnaby, BC V5A 1S6, Canada {hshen,anoop}@cs.sfu.ca Abstract.

More information

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A

Statistical parsing. Fei Xia Feb 27, 2009 CSE 590A Statistical parsing Fei Xia Feb 27, 2009 CSE 590A Statistical parsing History-based models (1995-2000) Recent development (2000-present): Supervised learning: reranking and label splitting Semi-supervised

More information

Background and Context for CLASP. Nancy Ide, Vassar College

Background and Context for CLASP. Nancy Ide, Vassar College Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding

More information

Corpus Linguistics. Seminar Resources for Computational Linguists SS Magdalena Wolska & Michaela Regneri

Corpus Linguistics. Seminar Resources for Computational Linguists SS Magdalena Wolska & Michaela Regneri Seminar Resources for Computational Linguists SS 2007 Magdalena Wolska & Michaela Regneri Armchair Linguists vs. Corpus Linguists Competence Performance 2 Motivation (for ) 3 Outline Corpora Annotation

More information

Do we agree on user interface aesthetics of Android apps?

Do we agree on user interface aesthetics of Android apps? Do we agree on user interface aesthetics of Android apps? Christiane G. von Wangenheim*ª, João V. Araujo Portoª, Jean C.R. Hauckª, Adriano F. Borgattoª ªDepartment of Informatics and Statistics Federal

More information

Apache UIMA and Mayo ctakes

Apache UIMA and Mayo ctakes Apache and Mayo and how it is used in the clinical domain March 16, 2012 Apache and Mayo Outline 1 Apache and Mayo Outline 1 2 Introducing Pipeline Modules Apache and Mayo What is? (You - eee - muh) Unstructured

More information

Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc.

Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. Copyright (c) 2009 John Wiley & Sons, Inc. Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. 1 Chapter 8 Introduction to Statistical Quality Control, 6 th Edition by Douglas C. Montgomery. 2 Learning

More information

Information Retrieval

Information Retrieval Information Retrieval ETH Zürich, Fall 2012 Thomas Hofmann LECTURE 6 EVALUATION 24.10.2012 Information Retrieval, ETHZ 2012 1 Today s Overview 1. User-Centric Evaluation 2. Evaluation via Relevance Assessment

More information

Probabilistic parsing with a wide variety of features

Probabilistic parsing with a wide variety of features Probabilistic parsing with a wide variety of features Mark Johnson Brown University IJCNLP, March 2004 Joint work with Eugene Charniak (Brown) and Michael Collins (MIT) upported by NF grants LI 9720368

More information

Package phrasemachine

Package phrasemachine Type Package Title Simple Phrase Extraction Version 1.1.2 Date 2017-05-29 Package phrasemachine May 29, 2017 Author Matthew J. Denny, Abram Handler, Brendan O'Connor Maintainer Matthew J. Denny

More information

Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation

Center for Reflected Text Analytics. Lecture 2 Annotation tools & Segmentation Center for Reflected Text Analytics Lecture 2 Annotation tools & Segmentation Summary of Part 1 Annotation theory Guidelines Inter-Annotator agreement Inter-subjective annotations Annotation exercise Discuss

More information

Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora

Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora Narrative Text Classification for Automatic Key Phrase Extraction in Web Document Corpora Yongzheng Zhang, Nur Zincir-Heywood, and Evangelos Milios Faculty of Computer Science, Dalhousie University 6050

More information

Retrieval Evaluation. Hongning Wang

Retrieval Evaluation. Hongning Wang Retrieval Evaluation Hongning Wang CS@UVa What we have learned so far Indexed corpus Crawler Ranking procedure Research attention Doc Analyzer Doc Rep (Index) Query Rep Feedback (Query) Evaluation User

More information

NLTK Server Documentation

NLTK Server Documentation NLTK Server Documentation Release 1 Preetham MS January 31, 2017 Contents 1 Documentation 3 1.1 Installation................................................ 3 1.2 API Documentation...........................................

More information

CS 224N Assignment 2 Writeup

CS 224N Assignment 2 Writeup CS 224N Assignment 2 Writeup Angela Gong agong@stanford.edu Dept. of Computer Science Allen Nie anie@stanford.edu Symbolic Systems Program 1 Introduction 1.1 PCFG A probabilistic context-free grammar (PCFG)

More information

A bit of theory: Algorithms

A bit of theory: Algorithms A bit of theory: Algorithms There are different kinds of algorithms Vector space models. e.g. support vector machines Decision trees, e.g. C45 Probabilistic models, e.g. Naive Bayes Neural networks, e.g.

More information

Projektgruppe. Michael Meier. Named-Entity-Recognition Pipeline

Projektgruppe. Michael Meier. Named-Entity-Recognition Pipeline Projektgruppe Michael Meier Named-Entity-Recognition Pipeline What is Named-Entitiy-Recognition? Named-Entity Nameable objects in the world, e.g.: Person: Albert Einstein Organization: Deutsche Bank Location:

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Introduction to Text Mining. Aris Xanthos - University of Lausanne

Introduction to Text Mining. Aris Xanthos - University of Lausanne Introduction to Text Mining Aris Xanthos - University of Lausanne Preliminary notes Presentation designed for a novice audience Text mining = text analysis = text analytics: using computational and quantitative

More information

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition

A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition A Multiclassifier based Approach for Word Sense Disambiguation using Singular Value Decomposition Ana Zelaia, Olatz Arregi and Basilio Sierra Computer Science Faculty University of the Basque Country ana.zelaia@ehu.es

More information

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from

More information

SAPIENT Automation project

SAPIENT Automation project Dr Maria Liakata Leverhulme Trust Early Career fellow Department of Computer Science, Aberystwyth University Visitor at EBI, Cambridge mal@aber.ac.uk 25 May 2010, London Motivation SAPIENT Automation Project

More information

Unsupervised Keyword Extraction from Single Document. Swagata Duari Aditya Gupta Vasudha Bhatnagar

Unsupervised Keyword Extraction from Single Document. Swagata Duari Aditya Gupta Vasudha Bhatnagar Unsupervised Keyword Extraction from Single Document Swagata Duari Aditya Gupta Vasudha Bhatnagar Presentation Outline Introduction and Motivation Statistical Methods for Automatic Keyword Extraction Graph-based

More information

BD003: Introduction to NLP Part 2 Information Extraction

BD003: Introduction to NLP Part 2 Information Extraction BD003: Introduction to NLP Part 2 Information Extraction The University of Sheffield, 1995-2017 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. Contents This

More information

Semantics Isn t Easy Thoughts on the Way Forward

Semantics Isn t Easy Thoughts on the Way Forward Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University

More information

Contents. List of Figures. List of Tables. Acknowledgements

Contents. List of Figures. List of Tables. Acknowledgements Contents List of Figures List of Tables Acknowledgements xiii xv xvii 1 Introduction 1 1.1 Linguistic Data Analysis 3 1.1.1 What's data? 3 1.1.2 Forms of data 3 1.1.3 Collecting and analysing data 7 1.2

More information

Jubilee: Propbank Instance Editor Guideline (Version 2.1)

Jubilee: Propbank Instance Editor Guideline (Version 2.1) Jubilee: Propbank Instance Editor Guideline (Version 2.1) Jinho D. Choi choijd@colorado.edu Claire Bonial bonial@colorado.edu Martha Palmer mpalmer@colorado.edu Center for Computational Language and EducAtion

More information

Service Control EasyApp Measuring Quality of Experience

Service Control EasyApp Measuring Quality of Experience Service Control EasyApp Measuring Quality of Experience Abstract This Cisco Service Control Engine (SCE) EasyApp memo explains the concept of quality of experience (QoE), an approach to measure network

More information

Corpus Linguistics for NLP APLN550. Adam Meyers Montclair State University 9/22/2014 and 9/29/2014

Corpus Linguistics for NLP APLN550. Adam Meyers Montclair State University 9/22/2014 and 9/29/2014 Corpus Linguistics for NLP APLN550 Adam Meyers Montclair State University 9/22/ and 9/29/ Text Corpora in NLP Corpus Selection Corpus Annotation: Purpose Representation Issues Linguistic Methods Measuring

More information

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio.

IBM Watson Application Developer Workshop. Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio. IBM Watson Application Developer Workshop Lab02 Watson Knowledge Studio: Building a Machine-learning Annotator with Watson Knowledge Studio January 2017 Duration: 60 minutes Prepared by Víctor L. Fandiño

More information

Standardization in Assessment and Reporting of Intercoder Reliability in Content Analyses

Standardization in Assessment and Reporting of Intercoder Reliability in Content Analyses Standardization in Assessment and Reporting of Intercoder Reliability in Content Analyses Matthew Lombard, Temple University December 5, 2008 University of Michigan Overview History of interest in topic

More information

Introduction to Information Extraction (IE) and ANNIE

Introduction to Information Extraction (IE) and ANNIE Module 1 Session 2 Introduction to Information Extraction (IE) and ANNIE The University of Sheffield, 1995-2015 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence.

More information

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T

The Muc7 T Corpus. 1 Introduction. 2 Creation of Muc7 T The Muc7 T Corpus Katrin Tomanek and Udo Hahn Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany {katrin.tomanek udo.hahn}@uni-jena.de 1 Introduction

More information

WEB HARVESTING AND SENTIMENT ANALYSIS OF CONSUMER FEEDBACK

WEB HARVESTING AND SENTIMENT ANALYSIS OF CONSUMER FEEDBACK WEB HARVESTING AND SENTIMENT ANALYSIS OF CONSUMER FEEDBACK Emil Şt. CHIFU, Tiberiu Şt. LEŢIA, Bogdan BUDIŞAN, Viorica R. CHIFU Faculty of Automation and Computer Science, Technical University of Cluj-Napoca

More information

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University

CS473: Course Review CS-473. Luo Si Department of Computer Science Purdue University CS473: CS-473 Course Review Luo Si Department of Computer Science Purdue University Basic Concepts of IR: Outline Basic Concepts of Information Retrieval: Task definition of Ad-hoc IR Terminologies and

More information

Conditional Random Fields. Mike Brodie CS 778

Conditional Random Fields. Mike Brodie CS 778 Conditional Random Fields Mike Brodie CS 778 Motivation Part-Of-Speech Tagger 2 Motivation object 3 Motivation I object! 4 Motivation object Do you see that object? 5 Motivation Part-Of-Speech Tagger -

More information

Natural Language Processing Tutorial May 26 & 27, 2011

Natural Language Processing Tutorial May 26 & 27, 2011 Cognitive Computation Group Natural Language Processing Tutorial May 26 & 27, 2011 http://cogcomp.cs.illinois.edu So why aren t words enough? Depends on the application more advanced task may require more

More information

Query Difficulty Prediction for Contextual Image Retrieval

Query Difficulty Prediction for Contextual Image Retrieval Query Difficulty Prediction for Contextual Image Retrieval Xing Xing 1, Yi Zhang 1, and Mei Han 2 1 School of Engineering, UC Santa Cruz, Santa Cruz, CA 95064 2 Google Inc., Mountain View, CA 94043 Abstract.

More information

Maca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology

Maca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Maca a configurable tool to integrate Polish morphological data Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Outline Morphological resources for Polish Tagset and segmentation differences

More information

Structured Prediction Basics

Structured Prediction Basics CS11-747 Neural Networks for NLP Structured Prediction Basics Graham Neubig Site https://phontron.com/class/nn4nlp2017/ A Prediction Problem I hate this movie I love this movie very good good neutral bad

More information

TectoMT: Modular NLP Framework

TectoMT: Modular NLP Framework : Modular NLP Framework Martin Popel, Zdeněk Žabokrtský ÚFAL, Charles University in Prague IceTAL, 7th International Conference on Natural Language Processing August 17, 2010, Reykjavik Outline Motivation

More information

ANC2Go: A Web Application for Customized Corpus Creation

ANC2Go: A Web Application for Customized Corpus Creation ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu

More information

Semantic Pattern Classification

Semantic Pattern Classification PFL054 Term project 2011/2012 Semantic Pattern Classification Ema Krejčová 1 Introduction The aim of the project is to construct classifiers which should assign semantic patterns to six given verbs, as

More information

Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018

Building Multilingual Resources and Neural Models for Word Sense Disambiguation. Alessandro Raganato March 15th, 2018 Building Multilingual Resources and Neural Models for Word Sense Disambiguation Alessandro Raganato March 15th, 2018 About me alessandro.raganato@helsinki.fi http://wwwusers.di.uniroma1.it/~raganato ERC

More information

Functional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning

Functional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning Functional Semantic Categories for Art History Text: Human Labeling and Preliminary Machine Learning Rebecca J. Passonneau 1, Tae Yano 2, Tom Lippincott 3, and Judith Klavans 4 1 Center for Computational

More information

Information Retrieval. Lecture 7

Information Retrieval. Lecture 7 Information Retrieval Lecture 7 Recap of the last lecture Vector space scoring Efficiency considerations Nearest neighbors and approximations This lecture Evaluating a search engine Benchmarks Precision

More information

A Methodology for Evaluating Aggregated Search Results

A Methodology for Evaluating Aggregated Search Results A Methodology for Evaluating Aggregated Search Results Jaime Arguello 1, Fernando Diaz 2, Jamie Callan 1, and Ben Carterette 3 1 Carnegie Mellon University 2 Yahoo! Research 3 University of Delaware Abstract.

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

KH Coder 3 Reference Manual

KH Coder 3 Reference Manual KH Coder 3 Reference Manual Koichi HIGUCHI * 1 March 16, 2016 *1 Ritsumeikan University i Contents A KH Coder Reference Manual 1 A.1 Setup...............................................

More information

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras

Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,

More information

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Jan 23 rd

LING/C SC 581: Advanced Computational Linguistics. Lecture Notes Jan 23 rd LING/C SC 581: Advanced Computational Linguistics Lecture Notes Jan 23 rd Today's Topics Homework 2 review Homework 2 review Write a Python program to print out the number of syllables in a word (in CMUdict).

More information

Package corenlp. June 3, 2015

Package corenlp. June 3, 2015 Type Package Title Wrappers Around Stanford CoreNLP Tools Version 0.4-1 Author Taylor Arnold, Lauren Tilton Package corenlp June 3, 2015 Maintainer Taylor Arnold Provides a minimal

More information

Evaluation. David Kauchak cs160 Fall 2009 adapted from:

Evaluation. David Kauchak cs160 Fall 2009 adapted from: Evaluation David Kauchak cs160 Fall 2009 adapted from: http://www.stanford.edu/class/cs276/handouts/lecture8-evaluation.ppt Administrative How are things going? Slides Points Zipf s law IR Evaluation For

More information

Lexical Semantics. Regina Barzilay MIT. October, 5766

Lexical Semantics. Regina Barzilay MIT. October, 5766 Lexical Semantics Regina Barzilay MIT October, 5766 Last Time: Vector-Based Similarity Measures man woman grape orange apple n Euclidian: x, y = x y = i=1 ( x i y i ) 2 n x y x i y i i=1 Cosine: cos( x,

More information

Making Sense Out of the Web

Making Sense Out of the Web Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide

More information

Introduction to IE and ANNIE

Introduction to IE and ANNIE Introduction to IE and ANNIE The University of Sheffield, 1995-2013 This work is licenced under the Creative Commons Attribution-NonCommercial-ShareAlike Licence. About this tutorial This tutorial comprises

More information

XML Support for Annotated Language Resources

XML Support for Annotated Language Resources XML Support for Annotated Language Resources Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York USA ide@cs.vassar.edu Laurent Romary Equipe Langue et Dialogue LORIA/CNRS Vandoeuvre-lès-Nancy,

More information

Feature Based Sentimental Analysis on Mobile Web Domain

Feature Based Sentimental Analysis on Mobile Web Domain IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 08, Issue 6 (June. 2018), V (VII) PP 01-07 www.iosrjen.org Shoieb Ahamed 1 Alit Danti 2 1. Government First Grade College,

More information

Homework 2: Parsing and Machine Learning

Homework 2: Parsing and Machine Learning Homework 2: Parsing and Machine Learning COMS W4705_001: Natural Language Processing Prof. Kathleen McKeown, Fall 2017 Due: Saturday, October 14th, 2017, 2:00 PM This assignment will consist of tasks in

More information