Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit
|
|
- Godwin Haynes
- 5 years ago
- Views:
Transcription
1 Data for linguistics ALEXIS DIMITRIADIS
2 Text, corpora, and data in the wild
3 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study: Written, spoken, child-directed etc. Databases with already-distilled information. The usual: Google.
4 2. Searching for real examples Using google, we can check whether particular constructions are really impossible: 1. a. John saw a snake near him b. * John saw a snake near himself Google: near him hits near himself hits That s a ratio of 141 : 1, or about 0.7%. But not negligible! What are the hits like?
5 Top hits for near himself Archaic/biblical examples, but not only
6
7 3. Structured corpora In the domain of Language Resources, high-quality corpora and other resources are created for various (usually computational) purposes. Considerations: Selection of materials in the corpus: Quality, balance. Clean-up, segmentation into sentences, metadata. Tagging, parsing, and other annotations. Other language resources: Parallel corpora, dictionaries, collocation lists, wordnets,... Corpora are easy to find on the web. Many cost money, some only require registration.
8 4. Some free corpora Brown corpus (1961) 500 sources, categorized in diverse genres. One million words. Gutenberg project: Contains 25,000 free electronic books. Countless other specialized corpora: Legal texts, medical texts, child language, L2 learners of English,... Child Language Data Exchange System (CHILDES)
9 5. The British National Corpus (BNC) A 100 million word collection from a wide range of sources, written and spoken. Designed to represent a wide cross-section of British English from the late 20th century. Written part(90%): extracts from newspapers, periodicals for all ages and interests, academic books and popular fiction, published and unpublished letters, etc. Spoken part (10%): Transcriptions of unscripted informal conversations and spoken language, in contexts ranging from formal business or government meetings to radio shows and phone-ins. Tagged (automatically) with part of speech.
10 6. Dutch corpora From the Instituut voor Nederlandse Lexicologie (INL): 38 Miljoen Woorden Corpus (and many smaller collections) Corpus Gesproken Nederlands (CGN) Alpino treebank (parsed corpus). More than 150,000 words
11 7. Rolling our own With so much data on the web, it s easy to collect as much data as a linguist could conceivably need. Automatic annotation tools can help us search our data more easily. Compiling and annotating a corpus does require a time investment. An on-line tagger for Dutch text (when it works)
12 8. The power of the web-crawling approach Online Database of Interlinear Text (ODIN)
13 Data and databases
14 9. Managing linguistic data 1. Keep it in Word documents Easy to get started; can store any kind of information. But: Hard to count, sort, or get an overview of contents. Only one person at a time can edit the data. 2. Use a spreadsheet (Excel) Can store tabular information (only), sort, and calculate statistics. Simple queries. Limited options for display. Only suited for tabular data. One editor at a time. 3. Use a database Powerful: open-ended display, collaborative data entry, full queries. Complex to set up. Stucture imposed on contents may be too restrictive.
15 Managing linguistic data II For messier data collections, we need more flexibility Keep the data in text files (not Word documents) Search and manage the data as needed, using a variety of tools. Python is a flexible programming language; the Natural Language Toolkit (NLTK) gives us numerous tools we can use to explore text. It is still difficult for multiple people to work on the same collection; but not as difficult as with a single document.
16 10. Notable cross-linguistic databases Directory of the world s languages: The World Atlas of Language Structures (WALS) A collection of typological databases: New server: A different kind of collection: Online Database of Interlinear Text (ODIN)
17 More cross-linguistic databases A simple, focused cross-linguistic survey: The Berlin intensifier database Some more sophisticated examples: The Surrey databases Our own reciprocals database:
18 Contents 1 Where does language data come from? Searching for real examples Structured corpora Some free corpora The British National Corpus (BNC) Dutch corpora Rolling our own The power of the web-crawling approach Managing linguistic data Notable cross-linguistic databases
LING203: Corpus. March 9, 2009
LING203: Corpus March 9, 2009 Corpus A collection of machine readable texts SJSU LLD have many corpora http://linguistics.sjsu.edu/bin/view/public/chltcorpora Each corpus has a link to a description page
More informationRecent Developments in the Czech National Corpus
Recent Developments in the Czech National Corpus Michal Křen Charles University in Prague 3 rd Workshop on the Challenges in the Management of Large Corpora Lancaster 20 July 2015 Introduction of the project
More informationCorpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002
Corpus methods for sociolinguistics Emily M. Bender bender@csli.stanford.edu NWAV 31 - October 10, 2002 Overview Introduction Corpora of interest Software for accessing and analyzing corpora (demo) Basic
More informationANC2Go: A Web Application for Customized Corpus Creation
ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu
More informationHow to.. What is the point of it?
Program's name: Linguistic Toolbox 3.0 α-version Short name: LIT Authors: ViatcheslavYatsko, Mikhail Starikov Platform: Windows System requirements: 1 GB free disk space, 512 RAM,.Net Farmework Supported
More informationContents. List of Figures. List of Tables. Acknowledgements
Contents List of Figures List of Tables Acknowledgements xiii xv xvii 1 Introduction 1 1.1 Linguistic Data Analysis 3 1.1.1 What's data? 3 1.1.2 Forms of data 3 1.1.3 Collecting and analysing data 7 1.2
More informationSemantics Isn t Easy Thoughts on the Way Forward
Semantics Isn t Easy Thoughts on the Way Forward NANCY IDE, VASSAR COLLEGE REBECCA PASSONNEAU, COLUMBIA UNIVERSITY COLLIN BAKER, ICSI/UC BERKELEY CHRISTIANE FELLBAUM, PRINCETON UNIVERSITY New York University
More informationclarin:el an infrastructure for documenting, sharing and processing language data
clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use
More informationThe ANW: an online Dutch Dictionary 1 Carole Tiberius and Jan Niestadt Instituut voor Nederlandse Lexicologie (INL), Leiden
The ANW: an online Dutch Dictionary 1 Carole Tiberius and Jan Niestadt Instituut voor Nederlandse Lexicologie (INL), Leiden The Algemeen Nederlands Woordenboek (ANW) is an online scholarly dictionary of
More informationBest practices in the design, creation and dissemination of speech corpora at The Language Archive
LREC Workshop 18 2012-05-21 Istanbul Best practices in the design, creation and dissemination of speech corpora at The Language Archive Sebastian Drude, Daan Broeder, Peter Wittenburg, Han Sloetjes The
More informationFinal Project Discussion. Adam Meyers Montclair State University
Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...
More informationThe Turkish National Corpus (TNC): Comparing the Architectures of v1 and v2
The Turkish National Corpus (): Comparing the Architectures and Yeşim Aksan Selma Ayşe Özel Mersin University Mersin, Turkey yesimaksan@gmail.com Çukurova University Adana, Turkey saozel@gmail.com Hakan
More informationCommon Lab Research Infrastructure for the Arts and Humanities
Common Lab Research Infrastructure for the Arts and Humanities 1 AnnCor Annotation in AnnCor Annotation of Syntactic Structures Annotation Procedure & Guidelines Preprocessing Annotation Application Annotation
More informationHistorical Text Mining:
Historical Text Mining Historical Text Mining, and Historical Text Mining: Challenges and Opportunities Dr. Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liv.ac.uk http://www.csc.liv.ac.uk/~azaroth/
More informationL435/L555. Dept. of Linguistics, Indiana University Fall 2016
for : for : L435/L555 Dept. of, Indiana University Fall 2016 1 / 12 What is? for : Decent definition from wikipedia: Computer programming... is a process that leads from an original formulation of a computing
More informationNLP Lab Session Week 4 September 17, Reading and Processing Test, Stemming and Lemmatization. Getting Started
NLP Lab Session Week 4 September 17, 2014 Reading and Processing Test, Stemming and Lemmatization Getting Started In this lab session, we will use two saved files of python commands and definitions and
More informationNLTK is distributed with several corpora (singular: corpus). A corpus is a body of text (or other language data, eg speech).
1 ICL/Introduction to Python 3/2006-10-02 2 NLTK NLTK: Python Natural Language ToolKit NLTK is a set of Python modules which you can import into your programs, eg: from nltk_lite.utilities import re_show
More informationThe American National Corpus First Release
The American National Corpus First Release Nancy Ide and Keith Suderman Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA ide@cs.vassar.edu, suderman@cs.vassar.edu Abstract
More informationShrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India
International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent
More informationProgress Report STEVIN Projects
Progress Report STEVIN Projects Project Name Large Scale Syntactic Annotation of Written Dutch Project Number STE05020 Reporting Period October 2009 - March 2010 Participants KU Leuven, University of Groningen
More informationCorpus collection and analysis for the linguistic layman: The Gromoteur
Corpus collection and analysis for the linguistic layman: The Gromoteur Kim Gerdes LPP, Université Sorbonne Nouvelle & CNRS Abstract This paper presents a tool for corpus collection, handling, and statistical
More informationLIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases
LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring
More informationCLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,
CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen, 2014-06-23 1 Overview CLARIN Portal Find data and tools 2 Overview CLARIN Portal Find data and tools 3 CLARIN
More informationIntroducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS
Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from
More informationInformation Retrieval
Introduction Information Retrieval Information retrieval is a field concerned with the structure, analysis, organization, storage, searching and retrieval of information Gerard Salton, 1968 J. Pei: Information
More informationTIPSTER Text Phase II Architecture Requirements
1.0 INTRODUCTION TIPSTER Text Phase II Architecture Requirements 1.1 Requirements Traceability Version 2.0p 3 June 1996 Architecture Commitee tipster @ tipster.org The requirements herein are derived from
More informationIntroduction to Programming in Python (3)
Introduction to Programming in Python (3) Steve Renals s.renals@ed.ac.uk ICL 2 October 2005 : Python Natural Language ToolKit is a set of Python modules which you can import into your programs, eg: from
More informationEuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates
EuroParl-UdS: Preserving and Extending Metadata in Parliamentary Debates Alina Karakanta, Mihaela Vela, Elke Teich Department of Language Science and Technology, Saarland University Outline Introduction
More informationAn Open Linguistic Infrastructure for Annotated Corpora
An Open Linguistic Infrastructure for Annotated Corpora Nancy Ide 1 Introduction Annotated corpora are a fundamental resource for research and development in the field of natural language processing (NLP).
More informationMoodify. 1. Introduction. 2. System Architecture. 2.1 Data Fetching Component. W205-1 Rock Baek, Saru Mehta, Vincent Chio, Walter Erquingo Pezo
1. Introduction Moodify Moodify is an music web application that recommend songs to user based on mood. There are two ways a user can interact with the application. First, users can select a mood that
More informationThe Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Tools
The Annotation Graph Toolkit: Software Components for Building Linguistic Annotation Kazuaki Maeda, Steven Bird, Xiaoyi Ma and Haejoong Lee Linguistic Data Consortium, University of Pennsylvania 3615 Market
More informationNgram Search Engine with Patterns Combining Token, POS, Chunk and NE Information
Ngram Search Engine with Patterns Combining Token, POS, Chunk and NE Information Satoshi Sekine Computer Science Department New York University sekine@cs.nyu.edu Kapil Dalwani Computer Science Department
More informationHG2052 Language, Technology and the Internet. The Web as Corpus
HG2052 Language, Technology and the Internet The Web as Corpus Francis Bond Division of Linguistics and Multilingual Studies http://www3.ntu.edu.sg/home/fcbond/ bond@ieee.org Lecture 7 Location: S3.2-B3-06
More informationParallel Concordancing and Translation. Michael Barlow
[Translating and the Computer 26, November 2004 [London: Aslib, 2004] Parallel Concordancing and Translation Michael Barlow Dept. of Applied Language Studies and Linguistics University of Auckland Auckland,
More informationDigital Humanities. Tutorial Regular Expressions. March 10, 2014
Digital Humanities Tutorial Regular Expressions March 10, 2014 1 Introduction In this tutorial we will look at a powerful technique, called regular expressions, to search for specific patterns in corpora.
More informationAn e-infrastructure for Language Documentation on the Web
An e-infrastructure for Language Documentation on the Web Gary F. Simons, SIL International William D. Lewis, University of Washington Scott Farrar, University of Arizona D. Terence Langendoen, National
More informationUnit 3 Corpus markup
Unit 3 Corpus markup 3.1 Introduction Data collected using a sampling frame as discussed in unit 2 forms a raw corpus. Yet such data typically needs to be processed before use. For example, spoken data
More informationThe Prime Machine: Software Version History
The Prime Machine: Software Version History The Prime Machine is a user-friendly corpus tool for English language teaching and selftutoring based on the Lexical Priming theory of language; the software
More informationCorpus Linguistics for NLP APLN550. Adam Meyers Montclair State University 9/22/2014 and 9/29/2014
Corpus Linguistics for NLP APLN550 Adam Meyers Montclair State University 9/22/ and 9/29/ Text Corpora in NLP Corpus Selection Corpus Annotation: Purpose Representation Issues Linguistic Methods Measuring
More informationAutomatic Bangla Corpus Creation
Automatic Bangla Corpus Creation Asif Iqbal Sarkar, Dewan Shahriar Hossain Pavel and Mumit Khan BRAC University, Dhaka, Bangladesh asif@bracuniversity.net, pavel@bracuniversity.net, mumit@bracuniversity.net
More informationIntroduction to Text Mining. Aris Xanthos - University of Lausanne
Introduction to Text Mining Aris Xanthos - University of Lausanne Preliminary notes Presentation designed for a novice audience Text mining = text analysis = text analytics: using computational and quantitative
More informationConcorDance. A Simple Concordance Interface for Search Engines
KTH Stockholm October 26, 2005 Skolan för Datavetenskap och Kommunikation Numerisk analys och datalogi Course: 2D1418 Språkteknologi Autumn Term 2005 Course Instructor: Ola Knutsson ConcorDance A Simple
More informationDHTK: The Digital Humanities ToolKit
DHTK: The Digital Humanities ToolKit Davide Picca, Mattia Egloff University of Lausanne Abstract. Digital Humanities have the merit of connecting two very different disciplines such as humanities and computer
More informationTen Tips for Smarter Google Searches Date: Dec 1, 2006 By Michael Miller. Article is provided courtesy of Que.
Page 1 of 5 Ten Tips for Smarter Google Searches Date: Dec 1, 2006 By Michael Miller. Article is provided courtesy of Que. Most people use Google in a very inefficient and often ineffective manner. If
More informationINF FALL NATURAL LANGUAGE PROCESSING. Jan Tore Lønning, Lecture 4, 10.9
1 INF5830 2015 FALL NATURAL LANGUAGE PROCESSING Jan Tore Lønning, Lecture 4, 10.9 2 Working with texts From bits to meaningful units Today: 3 Reading in texts Character encodings and Unicode Word tokenization
More informationInformation Retrieval
Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,
More informationNLTK Tutorial. CSC 485/2501 September 17, Krish Perumal /
NLTK Tutorial CSC 485/2501 September 17, 2015 Krish Perumal krish@cs.toronto.edu / t4peruma@cdf.toronto.edu Based on slides by Katie Fraser and Sean Robertson CDF Computing Disciplines Facility www.cdf.toronto.edu
More informationManning Chapter: Text Retrieval (Selections) Text Retrieval Tasks. Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniques
Text Retrieval Readings Introduction Manning Chapter: Text Retrieval (Selections) Text Retrieval Tasks Vorhees & Harman (Bulkpack) Evaluation The Vector Space Model Advanced Techniues 1 2 Text Retrieval:
More informationProcessing XML Text with Python and ElementTree a Practical Experience
Processing XML Text with Python and ElementTree a Practical Experience Radovan Garabík L udovít Štúr Institute of Linguistics Slovak Academy of Sciences Bratislava, Slovakia Abstract In this paper, we
More informationBuilding Search Applications
Building Search Applications Lucene, LingPipe, and Gate Manu Konchady Mustru Publishing, Oakton, Virginia. Contents Preface ix 1 Information Overload 1 1.1 Information Sources 3 1.2 Information Management
More informationCorpus Linguistics. Seminar Resources for Computational Linguists SS Magdalena Wolska & Michaela Regneri
Seminar Resources for Computational Linguists SS 2007 Magdalena Wolska & Michaela Regneri Armchair Linguists vs. Corpus Linguists Competence Performance 2 Motivation (for ) 3 Outline Corpora Annotation
More informationA cocktail approach to the VideoCLEF 09 linking task
A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,
More informationThe software for this server was created by Floris van Vugt (programmer) and Alexis Dimitriadis, for the Berlin-Utrecht Reciprocals Survey.
Data entry with the BURS Database Server Alexis Dimitriadis This version: 6 October 2008 Contents 1. Background 2. System requirements 3. The data entry process 4. Important points (Make sure you read
More informationKnowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.
Knowledge Retrieval Franz J. Kurfess Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A. 1 Acknowledgements This lecture series has been sponsored by the European
More informationUIMA-based Annotation Type System for a Text Mining Architecture
UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and
More informationHow translators work in real life: SCATE observations. Frieda Steurs Iulianna van der Lek-Ciudin Tom Vanallemeersch
How translators work in real life: SCATE observations Frieda Steurs Iulianna van der Lek-Ciudin Tom Vanallemeersch What & Why Improve translation efficiency and consistency Underexploited translation resources
More informationNatural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus
Natural Language Processing Pipelines to Annotate BioC Collections with an Application to the NCBI Disease Corpus Donald C. Comeau *, Haibin Liu, Rezarta Islamaj Doğan and W. John Wilbur National Center
More informationIf you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC
If you re using a Mac, follow these commands to prepare your computer to run these demos (and any other analysis you conduct with the Audio BNC sample). All examples use your Workshop directory (e.g. /Users/peggy/workshop)
More informationKielipankki and Korp
Kielipankki and Korp Mietta Lennes and Jyrki Niemi, FIN-CLARIN / University of Helsinki Korp & Språkbanken workshop 17.10.2018 This document is licensed under Creative Commons Attribution 4.0. www.kielipankki.fi
More informationA BNC-like corpus of American English
The American National Corpus Everything You Always Wanted To Know... And Weren t Afraid To Ask Nancy Ide Department of Computer Science Vassar College What is the? A BNC-like corpus of American English
More informationRefresher on Dependency Syntax and the Nivre Algorithm
Refresher on Dependency yntax and Nivre Algorithm Richard Johansson 1 Introduction This document gives more details about some important topics that re discussed very quickly during lecture: dependency
More informationDatabase of historical places, persons, and lemmas
Database of historical places, persons, and lemmas Natalia Korchagina Outline 1. Introduction 1.1 Swiss Law Sources Foundation as a Digital Humanities project 1.2 Data to be stored 1.3 Final goal: how
More informationAn Architecture for Editing Complex Digital Documents
An Architecture for Editing Complex Digital Documents Tomaž Erjavec Dept. of Knowledge Technologies Jožef Stefan Institute Jamova cesta 39, Ljubljana, Slovenia tomaz.erjavec@ijs.si Summary In several on-going
More informationSemantic and Multimodal Annotation. CLARA University of Copenhagen August 2011 Susan Windisch Brown
Semantic and Multimodal Annotation CLARA University of Copenhagen 15-26 August 2011 Susan Windisch Brown 2 Program: Monday Big picture Coffee break Lexical ambiguity and word sense annotation Lunch break
More informationBackground and Context for CLASP. Nancy Ide, Vassar College
Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding
More informationProfiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight
Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration
More informationCopyright 2018 Maxprograms
Copyright 2018 Maxprograms Table of Contents Introduction... 1 TMXEditor... 1 Features... 1 Getting Started... 2 Editing an existing file... 2 Create New TMX File... 3 Maintenance Tasks... 4 Sorting TM
More informationWEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS
1 WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS BRUCE CROFT NSF Center for Intelligent Information Retrieval, Computer Science Department, University of Massachusetts,
More informationPrivacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras
Privacy and Security in Online Social Networks Department of Computer Science and Engineering Indian Institute of Technology, Madras Lecture - 25 Tutorial 5: Analyzing text using Python NLTK Hi everyone,
More informationCorpus Linguistics: corpus annotation
Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course
More informationPractical Natural Language Processing with Senior Architect West Monroe Partners
Practical Natural Language Processing with Hadoop @DanRosanova Senior Architect West Monroe Partners A little about me & West Monroe Partners 15 years in technology consulting 5 time Microsoft Integration
More information1.0 Abstract. 2.0 TIPSTER and the Computing Research Laboratory. 2.1 OLEADA: Task-Oriented User- Centered Design in Natural Language Processing
Oleada: User-Centered TIPSTER Technology for Language Instruction 1 William C. Ogden and Philip Bernick The Computing Research Laboratory at New Mexico State University Box 30001, Department 3CRL, Las
More informationWikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population
Wikipedia and the Web of Confusable Entities: Experience from Entity Linking Query Creation for TAC 2009 Knowledge Base Population Heather Simpson 1, Stephanie Strassel 1, Robert Parker 1, Paul McNamee
More informationA Test Environment for Natural Language Understanding Systems
A Test Environment for Natural Language Understanding Systems Li Li, Deborah A. Dahl, Lewis M. Norton, Marcia C. Linebarger, Dongdong Chen Unisys Corporation 2476 Swedesford Road Malvern, PA 19355, U.S.A.
More informationErhard Hinrichs, Thomas Zastrow University of Tübingen
WebLicht A Service Oriented Architecture for Language Resources and Tools Erhard Hinrichs, Thomas Zastrow University of Tübingen Current Situation Many linguistic resources (corpora, dictionaries, ) and
More informationTTNWW. TST tools voor het Nederlands als Webservices in een Workflow. Marc Kemps-Snijders
TTNWW TST tools voor het Nederlands als Webservices in een Workflow Marc Kemps-Snijders Computational Linguistics in the Netherlands June 25 th, Nijmegen Marc. kemps.snijders@meertens.knaw.nl 1 http://www.clarin.nl
More informationPreservation. Session 4: Techniques & Audio. Arienne M. Dwyer University of Kansas. Yoshi Ono University of Alberta
Session 4: Techniques & Audio University of California at Santa Barbara, June 24-27, Arienne M. Dwyer University of Kansas Yoshi Ono University of Alberta 1 Session 4 s focus I. Homework review II. Transcriber
More information2- Computer Essentials
2-2.1 Computer overview As we have seen in the previous chapter, a computer is an electronic data processing device, which receives, stores input data, processes it, and delivers the output in a required
More informationOLAC: Accessing the World s Language Resources
OLAC: Accessing the World s Language Resources Steven Bird CSSE, University of Melbourne LDC, University of Pennsylvania Gary Simons SIL International Graduate Institute of Applied Linguistics What is
More informationUnstructured Data. CS102 Winter 2019
Winter 2019 Big Data Tools and Techniques Basic Data Manipulation and Analysis Performing well-defined computations or asking well-defined questions ( queries ) Data Mining Looking for patterns in data
More informationSpeech Recognition. Project: Phone Recognition using Sphinx. Chia-Ho Ling. Sunya Santananchai. Professor: Dr. Kepuska
Speech Recognition Project: Phone Recognition using Sphinx Chia-Ho Ling Sunya Santananchai Professor: Dr. Kepuska Objective Use speech data corpora to build a model using CMU Sphinx.Apply a built model
More informationLibrary. Guide to Searching the OPAC (Online Public Access Catalogue)
Library Guide to Searching the OPAC (Online Public Access Catalogue) Wessam El Husseini Assistant librarian for technical operations & information skills welabd@bue.edu.eg March 2012 The library owns several
More informationGet the most value from your surveys with text analysis
SPSS Text Analysis for Surveys 3.0 Specifications Get the most value from your surveys with text analysis The words people use to answer a question tell you a lot about what they think and feel. That s
More informationImplementing a Variety of Linguistic Annotations
Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing
More informationT H E D I G I TA L L I B R A R Y
THE DIGITAL LIBRARY About MediaINFO MediaINFO is a complete software solution for intuitive viewing, browsing, searching, cataloging and sharing digitized content. It is powering some of the world s most
More informationLING/C SC/PSYC 438/538. Lecture 2 Sandiway Fong
LING/C SC/PSYC 438/538 Lecture 2 Sandiway Fong Adminstrivia Reminder: Homework 1: JM Chapter 1 Homework 2: Install Perl and Python (if needed) Today s Topics App of the Day Homework 3 Start with Perl App
More informationARKive-ERA Project Lessons and Thoughts
ARKive-ERA Project Lessons and Thoughts Semantic Web for Scientific and Cultural Organisations Convitto della Calza 17 th June 2003 Paul Shabajee (ILRT, University of Bristol) 1 Contents Context Digitisation
More informationIntegrum information service
Integrum information service User s manual in Power Point April 2008 Password-based access Go to www.integrumworld.com and enter your login and password IP-based access Follow the assigned link and choose
More informationMeta-Content framework for back index generation
Meta-Content framework for back index generation Tripti Sharma, Assistant Professor Department of computer science Chhatrapati Shivaji Institute of Technology. Durg, India triptisharma@csitdurg.in Sarang
More informationMaking Sense Out of the Web
Making Sense Out of the Web Rada Mihalcea University of North Texas Department of Computer Science rada@cs.unt.edu Abstract. In the past few years, we have witnessed a tremendous growth of the World Wide
More informationAutomated Tagging to Enable Fine-Grained Browsing of Lecture Videos
Automated Tagging to Enable Fine-Grained Browsing of Lecture Videos K.Vijaya Kumar (09305081) under the guidance of Prof. Sridhar Iyer June 28, 2011 1 / 66 Outline Outline 1 Introduction 2 Motivation 3
More informationMeaning Banking and Beyond
Meaning Banking and Beyond Valerio Basile Wimmics, Inria November 18, 2015 Semantics is a well-kept secret in texts, accessible only to humans. Anonymous I BEG TO DIFFER Surface Meaning Step by step analysis
More informationCIT 590 Homework 5 HTML Resumes
CIT 590 Homework 5 HTML Resumes Purposes of this assignment Reading from and writing to files Scraping information from a text file Basic HTML usage General problem specification A website is made up of
More informationHomework 2: Parsing and Machine Learning
Homework 2: Parsing and Machine Learning COMS W4705_001: Natural Language Processing Prof. Kathleen McKeown, Fall 2017 Due: Saturday, October 14th, 2017, 2:00 PM This assignment will consist of tasks in
More informationInformatics 1: Data & Analysis
Informatics 1: Data & Analysis Lecture 9: Trees and XML Ian Stark School of Informatics The University of Edinburgh Tuesday 11 February 2014 Semester 2 Week 5 http://www.inf.ed.ac.uk/teaching/courses/inf1/da
More informationInformation Retrieval CSCI
Information Retrieval CSCI 4141-6403 My name is Anwar Alhenshiri My email is: anwar@cs.dal.ca I prefer: aalhenshiri@gmail.com The course website is: http://web.cs.dal.ca/~anwar/ir/main.html 5/6/2012 1
More informationThe Use of Corpora. in Second Language. Learning. Michael Colley. (Under the direction of William A. Kretzschmar, Jr.) Abstract
The Use of Corpora in Second Language Learning by Michael Colley (Under the direction of William A. Kretzschmar, Jr.) Abstract Corpora can be a valuable resource for students learning a foreign language.
More informationHow can CLARIN archive and curate my resources?
How can CLARIN archive and curate my resources? Christoph Draxler draxler@phonetik.uni-muenchen.de Outline! Relevant resources CLARIN infrastructure European Research Infrastructure Consortium National
More informationDatabase module of Vijjana, a Pragmatic Model for Collaborative, Self-organizing, Domain Centric Knowledge Networks
Database module of Vijjana, a Pragmatic Model for Collaborative, Self-organizing, Domain Centric Knowledge Networks Amara Satish Kumar, R. Reddy, L. Wang, S. Reddy Lane of CSEE Department West Virginia
More informationImporting MASC into the ANNIS linguistic database: A case study of mapping GrAF
Importing MASC into the ANNIS linguistic database: A case study of mapping GrAF Arne Neumann 1 Nancy Ide 2 Manfred Stede 1 1 EB Cognitive Science and SFB 632 University of Potsdam 2 Department of Computer
More information