Atlantic Provinces Library Association Lisa Goddard Memorial University Libraries May 2011

Size: px
Start display at page:

Download "Atlantic Provinces Library Association Lisa Goddard Memorial University Libraries May 2011"

Transcription

1 Are We Ready for the Digital Humanities? Atlantic Provinces Library Association Lisa Goddard Memorial University Libraries May 2011

2 What are the Digital Humanities? How can libraries support DH?

3 Humanities Philosophy, literature, religion, art, music, history and language. Core library users. Core print users.

4 Digital Humanities Opening up new knowledge and new ways of learning through the application of digital technologies to any humanities subject. DH is about creating digital toolsets that allow users to undertake new forms of research in the humanities. Growing interest and investment.

5 George Mason: CHNM

6 Stanford Literary Lab

7 U Virginia Scholar s Lab

8 Digital Humanities in Canada U Alberta U Toronto U Victoria UNB U Montreal McMaster McGill York

9 Digital Humanities Projects

10 Thematic Digital Archives

11 Virtual Anthologies

12 Aggregate, Annotate, Review

13 GIS & Mapping

14 Clustering & Visualization

15 Text Transcription & Markup

16 Edition Comparisons

17 3D Sculpture Modeling

18 Tool Building

19 Anatomy of a DH Project Dunning et al, Freeing up digital content with text mining, Serials. 22(2), July 2009

20 Text Corpus 17 th C English news pamphlets Dec 1653 to May 1654 British Library electronic texts 312 documents, words

21 CLAWS: Part of Speech Tagging Grammar parsing tool (Linguistics) Identifies proper nouns, common nouns, plural nouns, adjectives, prepositions 95 97% accuracy

22 Geographical Analysis

23 Geographical Analysis

24 USAS: Semantic Parsing... two_n1 ships_m4 from_z5 Dunkirk_Z2 have_z5 brought_m2 Men_S2.2m Arms_B1,_PUNC and_z5 Ammunition_G3 to_z5 Middleton_Z1mf Ships = M4 (shipping, swimming, etc.) Men = S2.2m (People:-Male) Ammunition = G3 (warfare, defence and the army; weapons)

25 GIS Mapping: Topic War

26 GIS Mapping: Topic Money

27 DH Growing Pains

28 DH Library Fears The Humanists are leaving us for the Computer Scientists.

29 Digital Humanities Sources

30 Primary source materials

31 Primary source materials

32 Primary source materials

33 Primary source materials

34 Large text corpora

35 Large text corpora

36 Large text corpora

37 Large text corpora

38 Large text corpora Proposed settlement In-copyright works owned by universities Non-consumptive purposes One or two centres

39 Linguistic Corpora

40 Historical Linguistic Corpora

41 Historical GIS Data

42 Historical Census Datasets

43 Etext Preferences: Cost

44 Etext Preferences: Quality

45 Etext Preferences: Availability

46 Licensing & Rights Multi-layered sources with different licensing conditions. Copyright over annotations, data sets, other user-generated information. Need for clearly expressed rights info.

47 Copyright review project

48 Library Role: Sources Acquire & preserve primary sources Digitization & transcription Open, flexible formats Large text aggregations Data sets Licensing & copyright

49 Digital Humanities Metadata

50 Traditional Metadata Access to print collections Authorities Thesauri Bibliographies Indexes Concordances

51 Finding Aids: Primary Sources

52 Finding Aids: Data Sets

53 Name Authorities

54 Name Authorities

55 Specialized Thesauri

56 Full Text Markup Text Encoding Initiative (TEI)

57 Metadata: TEI <div n="castlist type="dramatispersonae" org="uniform" sample="complete"> <castlist> <head>dramatis Personae</head> <castitem type="role"> <role xml:id="wag">wagner</role> </castitem> </castlist> </div>

58 Metadata: TEI <sp who="wag"> <lb xml:id="l204"/> <p>for is he not <foreign xml:lang="la"> Corpus naturale </foreign>? </p> </sp>

59 Metadata: TEI <l xml:id="l26">nothing so sweet as <choice> <orig>magicke</orig> <reg>magic</reg> </choice> is to him;</l>

60 Metadata: TEI <metdecl pattern="((+ -)+\?/?)*"> <metsym value="trochee" terminal="false">+-</metsym> <metsym value="iamb" terminal="false">-+</metsym> <metsym value="spondee" terminal="false">++</metsym> <metsym value="pyrrhic" terminal="false">--</metsym> <metsym value="amphibrach" terminal="false">-+-</metsym> <metsym value="anapaest" terminal="false">--+</metsym> <metsym value="+">metrical prominence</metsym> <metsym value="-">metrical non-prominence</metsym> <metsym value=" ">foot boundary</metsym> <metsym value="/">metrical line boundary</metsym> </metdecl>

61 Metadata: TEI <l rend="font-size(100%) indent(5px)">both go</l> <l rend="font-size(100%) indent(-7px)">to law:</l> <l rend="font-size(100%) indent(-23px)"><hi rend="italic">i</hi> will</l> <l rend="font-size(100%) indent(- 26px)">prosecute</l> <l rend="font-size(90%) indent(-40px)"><hi rend="italic">you.</hi> </l>

62 Controlled Vocabularies

63 Controlled Vocabularies

64 Metadata: Library Role Metadata crosswalks Controlled vocabularies Thesauri & taxonomy Name authorities Text mark-up

65 Digital Humanities Preservation

66 Preservation: Library Role Storage infrastructure Digital objects & texts User generated annotations User generated data sets Software environments & tools Preservation metadata Long term access

67 Library Support for Digital Humanities Sources Licensing Digitization Metadata GIS & Data Sharing Preservation

68 Thank you. Lisa Goddard Scholarly Communications Librarian

State of the Art and Trends in Search Engine Technology. Gerhard Weikum

State of the Art and Trends in Search Engine Technology. Gerhard Weikum State of the Art and Trends in Search Engine Technology Gerhard Weikum (weikum@mpi-inf.mpg.de) Commercial Search Engines Web search Google, Yahoo, MSN simple queries, chaotic data, many results key is

More information

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey. Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey. Chapter 1: Organization of Recorded Information The Need to Organize The Nature of Information Organization

More information

Automatically Annotating Text with Linked Open Data

Automatically Annotating Text with Linked Open Data Automatically Annotating Text with Linked Open Data Delia Rusu, Blaž Fortuna, Dunja Mladenić Jožef Stefan Institute Motivation: Annotating Text with LOD Open Cyc DBpedia WordNet Overview Related work Algorithms

More information

BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA

BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA BUDDHIST STONE SCRIPTURES FROM SHANDONG, CHINA Heidelberg Academy of Sciences and Humanities Research Group Buddhist Stone Scriptures in China Hauptstraße 113 69117 Heidelberg Germany marnold@zo.uni-heidelberg.de

More information

Enhanced retrieval using semantic technologies:

Enhanced retrieval using semantic technologies: Enhanced retrieval using semantic technologies: Ontology based retrieval as a new search paradigm? - Considerations based on new projects at the Bavarian State Library Dr. Berthold Gillitzer 28. Mai 2008

More information

clarin:el an infrastructure for documenting, sharing and processing language data

clarin:el an infrastructure for documenting, sharing and processing language data clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use

More information

Introduction to Text Mining. Aris Xanthos - University of Lausanne

Introduction to Text Mining. Aris Xanthos - University of Lausanne Introduction to Text Mining Aris Xanthos - University of Lausanne Preliminary notes Presentation designed for a novice audience Text mining = text analysis = text analytics: using computational and quantitative

More information

Introduction to Text Mining. Hongning Wang

Introduction to Text Mining. Hongning Wang Introduction to Text Mining Hongning Wang CS@UVa Who Am I? Hongning Wang Assistant professor in CS@UVa since August 2014 Research areas Information retrieval Data mining Machine learning CS@UVa CS6501:

More information

The American National Corpus First Release

The American National Corpus First Release The American National Corpus First Release Nancy Ide and Keith Suderman Department of Computer Science, Vassar College, Poughkeepsie, NY 12604-0520 USA ide@cs.vassar.edu, suderman@cs.vassar.edu Abstract

More information

Multilinguality - utopia or possibility?

Multilinguality - utopia or possibility? TrebleCLEF Workshop: Best practices for the development of multilingual information access systems Segovia, June 24th/25th, 2008 Multilinguality - utopia or possibility? Jörn Sieglerschmidt Nature of knowledge:

More information

Best practices in the design, creation and dissemination of speech corpora at The Language Archive

Best practices in the design, creation and dissemination of speech corpora at The Language Archive LREC Workshop 18 2012-05-21 Istanbul Best practices in the design, creation and dissemination of speech corpora at The Language Archive Sebastian Drude, Daan Broeder, Peter Wittenburg, Han Sloetjes The

More information

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa Computazionale, INTERNET and DBT Abstract The advent of Internet has had enormous impact on working patterns and development in many scientific

More information

The Text Encoding Initiative An Introduction The TEI

The Text Encoding Initiative An Introduction The TEI The Text Encoding Initiative An Introduction The TEI Dr Susan Schreibman University of Maryland May 2003 history of the TEI philosophy behind TEI practicalities of using it The Text Encoding Initiative

More information

ARKive-ERA Project Lessons and Thoughts

ARKive-ERA Project Lessons and Thoughts ARKive-ERA Project Lessons and Thoughts Semantic Web for Scientific and Cultural Organisations Convitto della Calza 17 th June 2003 Paul Shabajee (ILRT, University of Bristol) 1 Contents Context Digitisation

More information

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories

Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing Trusted Digital Repositories Purdue University Purdue e-pubs Libraries Faculty and Staff Presentations Purdue Libraries 2015 Applying Archival Science to Digital Curation: Advocacy for the Archivist s Role in Implementing and Managing

More information

Taming the TEI Tiger 6. Lou Burnard June 2004

Taming the TEI Tiger 6. Lou Burnard June 2004 Taming the TEI Tiger Lou Burnard June 2004 Today s topics The TEI and its architecture Working with the schema generator How does the TEI scheme work? In today s exercise, you ll learn how to build your

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Formats and standards for metadata, coding and tagging. Paul Meurer

Formats and standards for metadata, coding and tagging. Paul Meurer Formats and standards for metadata, coding and tagging Paul Meurer The FAIR principles FAIR principles for resources (data and metadata): Findable (-> persistent identifier, metadata, registered/indexed)

More information

How can CLARIN archive and curate my resources?

How can CLARIN archive and curate my resources? How can CLARIN archive and curate my resources? Christoph Draxler draxler@phonetik.uni-muenchen.de Outline! Relevant resources CLARIN infrastructure European Research Infrastructure Consortium National

More information

Growing interests in. Urgent needs of. Develop a fieldworkers toolkit (fwtk) for the research of endangered languages

Growing interests in. Urgent needs of. Develop a fieldworkers toolkit (fwtk) for the research of endangered languages ELPR IV International Conference 2002 Topics Reitaku University College of Foreign Languages Developing Tools for Creating-Maintaining-Analyzing Field Shoju CHIBA Reitaku University, Japan schiba@reitaku-u.ac.jp

More information

Text Encoding Fundamentals: Element list

Text Encoding Fundamentals: Element list Text Encoding Fundamentals: Element list Elements for basic TEI documents This is more of a brief reference sheet than an exhaustive list of TEI elements: it is intended to provide you with a way to look

More information

SharedCanvas. Interoperability for Digitized Medieval MSS Repositories. Using the OAC Data Model for Shareable Annotations Workshop March 2011

SharedCanvas. Interoperability for Digitized Medieval MSS Repositories. Using the OAC Data Model for Shareable Annotations Workshop March 2011 SharedCanvas Interoperability for Digitized Medieval MSS Repositories Using the OAC Data Model for Shareable Annotations Workshop 24-25 March 2011 Why medieval manuscripts? Overview: Parker on the Web

More information

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit

Data for linguistics ALEXIS DIMITRIADIS. Contents First Last Prev Next Back Close Quit Data for linguistics ALEXIS DIMITRIADIS Text, corpora, and data in the wild 1. Where does language data come from? The usual: Introspection, questionnaires, etc. Corpora, suited to the domain of study:

More information

Minoan linguistic resources: The Linear A digital Corpus

Minoan linguistic resources: The Linear A digital Corpus Minoan linguistic resources: The Linear A digital Corpus Tommaso Petrolito Ruggero Petrolito Grégoire Winterstein Francesco Perono Cacciafoco Filologia Letteratura e Linguistica, University of Pisa, Italy

More information

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites Access IT Training 2003 Google indexed 3,3 billion of pages http://searchenginewatch.com/3071371 2005 Google s index contains 8,1 billion of websites http://blog.searchenginewatch.com/050517-075657 Estimated

More information

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment Paul Watry Univ. of Liverpool, NaCTeM pwatry@liverpool.ac.uk Ray Larson Univ. of California, Berkeley

More information

RefWorks training workbook

RefWorks training workbook RefWorks training workbook PART 1 Setting up a RefWorks account Task 1 - Setting up an account PART 2 Adding references to RefWorks, editing references and using folders Task 2 - Search SOLO and export

More information

ISO Self-Assessment at the British Library. Caylin Smith Repository

ISO Self-Assessment at the British Library. Caylin Smith Repository ISO 16363 Self-Assessment at the British Library Caylin Smith Repository Manager caylin.smith@bl.uk @caylinssmith Outline Digital Preservation at the British Library The Library s Digital Collections Achieving

More information

The Functional Extension Parser (FEP) A Document Understanding Platform

The Functional Extension Parser (FEP) A Document Understanding Platform The Functional Extension Parser (FEP) A Document Understanding Platform Günter Mühlberger University of Innsbruck Department for German Language and Literature Studies Introduction A book is more than

More information

An Architecture for Editing Complex Digital Documents

An Architecture for Editing Complex Digital Documents An Architecture for Editing Complex Digital Documents Tomaž Erjavec Dept. of Knowledge Technologies Jožef Stefan Institute Jamova cesta 39, Ljubljana, Slovenia tomaz.erjavec@ijs.si Summary In several on-going

More information

Opus: University of Bath Online Publication Store

Opus: University of Bath Online Publication Store Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,

More information

B2FIND and Metadata Quality

B2FIND and Metadata Quality B2FIND and Metadata Quality 3 rd EUDAT Conference 25 September 2014 Heinrich Widmann and B2FIND team 1 Outline B2FIND the EUDAT Metadata Service Semantic Mapping of Metadata Quality of Metadata Summary

More information

Historical Text Mining:

Historical Text Mining: Historical Text Mining Historical Text Mining, and Historical Text Mining: Challenges and Opportunities Dr. Robert Sanderson Dept. of Computer Science University of Liverpool azaroth@liv.ac.uk http://www.csc.liv.ac.uk/~azaroth/

More information

Performing searches on Érudit

Performing searches on Érudit Performing searches on Érudit Table of Contents 1. Simple Search 3 2. Advanced search 2.1 Running a search 4 2.2 Operators and search fields 5 2.3 Filters 7 3. Search results 3.1. Refining your search

More information

arxiv: v2 [cs.cl] 19 Feb 2013

arxiv: v2 [cs.cl] 19 Feb 2013 PyPLN PyPLN: a Distributed Platform for Natural Language Processing arxiv:1301.7738v2 [cs.cl] 19 Feb 2013 Flávio Codeço Coelho School of Applied Mathematics Fundação Getulio Vargas Rio de Janeiro, RJ 22250-900,

More information

Search Framework for a Large Digital Records Archive DLF SPRING 2007 April 23-25, 25, 2007 Dyung Le & Quyen Nguyen ERA Systems Engineering National Ar

Search Framework for a Large Digital Records Archive DLF SPRING 2007 April 23-25, 25, 2007 Dyung Le & Quyen Nguyen ERA Systems Engineering National Ar Search Framework for a Large Digital Records Archive DLF SPRING 2007 April 23-25, 25, 2007 Dyung Le & Quyen Nguyen ERA Systems Engineering National Archives & Records Administration Agenda ERA Overview

More information

Question Answering Using XML-Tagged Documents

Question Answering Using XML-Tagged Documents Question Answering Using XML-Tagged Documents Ken Litkowski ken@clres.com http://www.clres.com http://www.clres.com/trec11/index.html XML QA System P Full text processing of TREC top 20 documents Sentence

More information

Morpho-syntactic Analysis with the Stanford CoreNLP

Morpho-syntactic Analysis with the Stanford CoreNLP Morpho-syntactic Analysis with the Stanford CoreNLP Danilo Croce croce@info.uniroma2.it WmIR 2015/2016 Objectives of this tutorial Use of a Natural Language Toolkit CoreNLP toolkit Morpho-syntactic analysis

More information

QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK

QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK QANUS A GENERIC QUESTION-ANSWERING FRAMEWORK NG, Jun Ping National University of Singapore ngjp@nus.edu.sg 30 November 2009 The latest version of QANUS and this documentation can always be downloaded from

More information

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight

Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger. Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Profiling Medical Journal Articles Using a Gene Ontology Semantic Tagger Mahmoud El-Haj Paul Rayson Scott Piao Jo Knight Origin and Outcomes Currently funded through a Wellcome Trust Seed award Collaboration

More information

Correlation to Georgia Quality Core Curriculum

Correlation to Georgia Quality Core Curriculum 1. Strand: Oral Communication Topic: Listening/Speaking Standard: Adapts or changes oral language to fit the situation by following the rules of conversation with peers and adults. 2. Standard: Listens

More information

Re-designing Online Terminology Resources for German Grammar

Re-designing Online Terminology Resources for German Grammar Re-designing Online Terminology Resources for German Grammar Project Report Karolina Suchowolec, Christian Lang, and Roman Schneider Institut für Deutsche Sprache (IDS), Mannheim, Germany {suchowolec,

More information

Question Answering Systems

Question Answering Systems Question Answering Systems An Introduction Potsdam, Germany, 14 July 2011 Saeedeh Momtazi Information Systems Group Outline 2 1 Introduction Outline 2 1 Introduction 2 History Outline 2 1 Introduction

More information

Meeting researchers needs in mining web archives: the experience of the National Library of France

Meeting researchers needs in mining web archives: the experience of the National Library of France Meeting researchers needs in mining web archives: the experience of the National Library of France Sara Aubry, IT Department Peter Stirling, Legal Deposit Department Bibliothèque nationale de France LIBER

More information

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008

IBE101: Introduction to Information Architecture. Hans Fredrik Nordhaug 2008 IBE101: Introduction to Information Architecture Hans Fredrik Nordhaug 2008 Objectives Defining IA Practicing IA User Needs and Behaviors The anatomy of IA Organizations Systems Labelling Systems Navigation

More information

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India

Shrey Patel B.E. Computer Engineering, Gujarat Technological University, Ahmedabad, Gujarat, India International Journal of Scientific Research in Computer Science, Engineering and Information Technology 2018 IJSRCSEIT Volume 3 Issue 3 ISSN : 2456-3307 Some Issues in Application of NLP to Intelligent

More information

DHTK: The Digital Humanities ToolKit

DHTK: The Digital Humanities ToolKit DHTK: The Digital Humanities ToolKit Davide Picca, Mattia Egloff University of Lausanne Abstract. Digital Humanities have the merit of connecting two very different disciplines such as humanities and computer

More information

Ortolang Tools : MarsaTag

Ortolang Tools : MarsaTag Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements

More information

MedLingMap: A growing resource mapping the Bio-Medical NLP field

MedLingMap: A growing resource mapping the Bio-Medical NLP field MedLingMap: A growing resource mapping the Bio-Medical NLP field Marie Meteer, Bensiin Borukhov, Michael Crivaro, Michael Shafir, Attapol Thamrongrattanarit {mmeteer, bborukhov, mcrivaro, mshafir, tet}@brandeis.edu

More information

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic Things to consider when using Semantics in your Information Management strategy Toby Conrad Smartlogic toby.conrad@smartlogic.com +1 773 251 0824 Some of Smartlogic s 250+ Customers Awards Trend Setting

More information

CLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen,

CLARIN for Linguists Portal & Searching for Resources. Jan Odijk LOT Summerschool Nijmegen, CLARIN for Linguists Portal & Searching for Resources Jan Odijk LOT Summerschool Nijmegen, 2014-06-23 1 Overview CLARIN Portal Find data and tools 2 Overview CLARIN Portal Find data and tools 3 CLARIN

More information

Automated Classification. Lars Marius Garshol Topic Maps

Automated Classification. Lars Marius Garshol Topic Maps Automated Classification Lars Marius Garshol Topic Maps 2007 2007-03-21 Automated classification What is it? Why do it? 2 What is automated classification? Create parts of a topic map

More information

Annotation by category - ELAN and ISO DCR

Annotation by category - ELAN and ISO DCR Annotation by category - ELAN and ISO DCR Han Sloetjes, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 AH Nijmegen, The Netherlands E-mail: Han.Sloetjes@mpi.nl, Peter.Wittenburg@mpi.nl

More information

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano Creating a Corporate Taxonomy Internet Librarian 2001 7 November 2001 Betsy Farr Cogliano 2001 The MITRE Corporation Revised October 2001 2 Background MITRE is a not-for-profit corporation operating three

More information

About the Library APA style Preparing to search Searching library e-resources for articles Searching the Internet

About the Library APA style Preparing to search Searching library e-resources for articles Searching the Internet About the Library APA style Preparing to search Searching library e-resources for articles Searching the Internet ENG 206 Report Presentation for Community Service Workers 13 May 2015 David McCarthy, Professor;

More information

Text mining tools for semantically enriching the scientific literature

Text mining tools for semantically enriching the scientific literature Text mining tools for semantically enriching the scientific literature Sophia Ananiadou Director National Centre for Text Mining School of Computer Science University of Manchester Need for enriching the

More information

Information Retrieval

Information Retrieval Multimedia Computing: Algorithms, Systems, and Applications: Information Retrieval and Search Engine By Dr. Yu Cao Department of Computer Science The University of Massachusetts Lowell Lowell, MA 01854,

More information

Online Legal Research: Secondary Sources

Online Legal Research: Secondary Sources Online Legal Research: Secondary Sources This guide will help you to find journal articles on a particular subject using the various legal resources available at SOAS FINDING TEXTBOOKS Use the Library

More information

Creating Digital Scholarly Editions: An Introduction to the Text Encoding Initiative (TEI)

Creating Digital Scholarly Editions: An Introduction to the Text Encoding Initiative (TEI) University of Michigan Deep Blue deepblue.lib.umich.edu 2011-03-19 Creating Digital Scholarly Editions: An Introduction to the Text Encoding Initiative (TEI) Welzenbach, Rebecca; Schaffner, Paul; Hawkins,

More information

Unit 3 Corpus markup

Unit 3 Corpus markup Unit 3 Corpus markup 3.1 Introduction Data collected using a sampling frame as discussed in unit 2 forms a raw corpus. Yet such data typically needs to be processed before use. For example, spoken data

More information

RefWorks training workbook

RefWorks training workbook RefWorks training workbook PART 1 Setting up a RefWorks account Task 1 - Setting up an account PART 2 Adding references to RefWorks, editing references and using folders Task 2 - Search SOLO and export

More information

4 FEBRUARY, Information architecture in theory

4 FEBRUARY, Information architecture in theory Information architecture in theory Literature Rosenfeld, L., Morville, P., & Arango, J. (2015). Information architecture: For the web and beyond (4th ed.). Beijing: O Reilly. Information architecture?

More information

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS

Introducing XAIRA. Lou Burnard Tony Dodd. An XML aware tool for corpus indexing and searching. Research Technology Services, OUCS Introducing XAIRA An XML aware tool for corpus indexing and searching Lou Burnard Tony Dodd Research Technology Services, OUCS What is XAIRA? XML Aware Indexing and Retrieval Architecture Developed from

More information

Combining Neural Networks and Log-linear Models to Improve Relation Extraction

Combining Neural Networks and Log-linear Models to Improve Relation Extraction Combining Neural Networks and Log-linear Models to Improve Relation Extraction Thien Huu Nguyen and Ralph Grishman Computer Science Department, New York University {thien,grishman}@cs.nyu.edu Outline Relation

More information

UIMA-based Annotation Type System for a Text Mining Architecture

UIMA-based Annotation Type System for a Text Mining Architecture UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and

More information

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm

Sense-based Information Retrieval System by using Jaccard Coefficient Based WSD Algorithm ISBN 978-93-84468-0-0 Proceedings of 015 International Conference on Future Computational Technologies (ICFCT'015 Singapore, March 9-30, 015, pp. 197-03 Sense-based Information Retrieval System by using

More information

Data Curation Profile Food Technology and Processing / Food Preservation

Data Curation Profile Food Technology and Processing / Food Preservation Data Curation Profile Food Technology and Processing / Food Preservation Profile Author Author s Institution Contact Researcher(s) Interviewed Researcher s Institution Sonia Wade Lorenz & Lisa Zilinski

More information

Summary of Bird and Simons Best Practices

Summary of Bird and Simons Best Practices Summary of Bird and Simons Best Practices 6.1. CONTENT (1) COVERAGE Coverage addresses the comprehensiveness of the language documentation and the comprehensiveness of one s documentation of one s methodology.

More information

Building for the Future

Building for the Future Building for the Future The National Digital Newspaper Program Deborah Thomas US Library of Congress DigCCurr 2007 Chapel Hill, NC April 19, 2007 1 What is NDNP? Provide access to historic newspapers Select

More information

The Six Dirty Secrets of Tagging

The Six Dirty Secrets of Tagging The Six Dirty Secrets of Tagging Prentiss Riddle Shadows.com Talk given on the Tagging 2.0 panel at SXSW, 3/12/2006 1 1. It s the content, stupid The point is not the tags, it s the objects they are applied

More information

TEXT MINING: THE NEXT DATA FRONTIER

TEXT MINING: THE NEXT DATA FRONTIER TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom 2 OpenMinTeD Establish an open and sustainable

More information

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Building and Annotating Corpora of Collaborative Authoring in Wikipedia Building and Annotating Corpora of Collaborative Authoring in Wikipedia Johannes Daxenberger, Oliver Ferschke and Iryna Gurevych Workshop: Building Corpora of Computer-Mediated Communication: Issues, Challenges,

More information

Using Linked Data to Reduce Learning Latency for e-book Readers

Using Linked Data to Reduce Learning Latency for e-book Readers Using Linked Data to Reduce Learning Latency for e-book Readers Julien Robinson, Johann Stan, and Myriam Ribière Alcatel-Lucent Bell Labs France, 91620 Nozay, France, Julien.Robinson@alcatel-lucent.com

More information

Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006

Archiving and Preserving the Web. Kristine Hanna Internet Archive November 2006 Archiving and Preserving the Web Kristine Hanna Internet Archive November 2006 1 About Internet Archive Non profit founded in 1996 by Brewster Kahle, as an Internet library Provide universal and permanent

More information

Technical documentation. D2.4 KPI Specification

Technical documentation. D2.4 KPI Specification Technical documentation D2.4 KPI Specification NorDataNet D2.4 KPI Specification Page: 2/6 Versions Version Date Comment Responsible 0.2 2016 11 25 Modified some statements to make them more precise, identified

More information

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal

EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal EUDAT B2FIND A Cross-Discipline Metadata Service and Discovery Portal Heinrich Widmann, DKRZ DI4R 2016, Krakow, 28 September 2016 www.eudat.eu EUDAT receives funding from the European Union's Horizon 2020

More information

Encoding and Designing for the Swift Poems Project

Encoding and Designing for the Swift Poems Project Encoding and Designing for the Swift Poems Project Jonathan Swift and the Text Encoding Initiative James R. Griffin III Digital Library Developer Lafayette College Libraries Introductions James Woolley

More information

Indigenous Languages of Latin America. Heidi Johnson / The University of Texas at Austin

Indigenous Languages of Latin America. Heidi Johnson / The University of Texas at Austin AILLA:The Archive of the Indigenous Languages of Latin America Heidi Johnson / The University of Texas at Austin AILLA is a joint project of: Anthropology: Joel Sherzer Linguistics: Anthony C. Woodbury

More information

Annotation Graphs, Annotation Servers and Multi-Modal Resources

Annotation Graphs, Annotation Servers and Multi-Modal Resources Annotation Graphs, Annotation Servers and Multi-Modal Resources Infrastructure for Interdisciplinary Education, Research and Development Christopher Cieri and Steven Bird University of Pennsylvania Linguistic

More information

Bachelor of Arts Program in Information Science

Bachelor of Arts Program in Information Science Bachelor of Arts Program in Information Science Philosophy Creativity Service-minded Information Specialist Degree Bachelor of Arts (Information Science) B.A. (Information Science) Now in the process of

More information

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE

COMP90042 LECTURE 3 LEXICAL SEMANTICS COPYRIGHT 2018, THE UNIVERSITY OF MELBOURNE COMP90042 LECTURE 3 LEXICAL SEMANTICS SENTIMENT ANALYSIS REVISITED 2 Bag of words, knn classifier. Training data: This is a good movie.! This is a great movie.! This is a terrible film. " This is a wonderful

More information

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA

CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA CREATING DIGITAL REPOSITORIES PRESENTED BY CHAMA MPUNDU MFULA CHIEF LIBRARIAN NATIONAL ASSEMBLY OF ZAMBIA Introduction Digital repositories (DR) are commonly referred to as institutional repositories or

More information

Data Curation Profile Plant Genomics

Data Curation Profile Plant Genomics Data Curation Profile Plant Genomics Profile Author Institution Name Contact J. Carlson Purdue University J. Carlson, jrcarlso@purdue.edu Date of Creation October 27, 2009 Date of Last Update Version 1.0

More information

Downloading Census Boundary Files Using Scholars GeoPortal Map and Data Library, Summer 2016

Downloading Census Boundary Files Using Scholars GeoPortal Map and Data Library, Summer 2016 Downloading Census Boundary Files Using Scholars GeoPortal Map and Data Library, Summer 2016 This exercise will showcase how to use Scholars GeoPortal to extract Census Tract (CT) census geography for

More information

Indexing Field Descriptions Recommended Practice

Indexing Field Descriptions Recommended Practice Indexing Field Descriptions Recommended Practice Service Alberta Enterprise Information Management Developed: Last Updated: https://www.alberta.ca/enterprise-information-management.aspx Contents Indexing...

More information

Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013

Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013 Community Tools and Best Practices for Harvesting and Preserving At-Risk Web Content ACA 2013 Scott Reed, Internet Archive Amanda Wakaruk, University of Alberta Libraries Kelly E. Lau, University of Alberta

More information

General information. Corpus of Spanish Golden-Age Sonnets

General information. Corpus of Spanish Golden-Age Sonnets Corpus of Spanish Golden-Age Sonnets, Borja Navarro Colorado, María Ribes Lafoz and Noelia Sánchez (ed.), 2015. https://github.com/bncolorado/corpussonetossiglodeoro (Last Accessed: 01.05.2017). Reviewed

More information

ANC2Go: A Web Application for Customized Corpus Creation

ANC2Go: A Web Application for Customized Corpus Creation ANC2Go: A Web Application for Customized Corpus Creation Nancy Ide, Keith Suderman, Brian Simms Department of Computer Science, Vassar College Poughkeepsie, New York 12604 USA {ide, suderman, brsimms}@cs.vassar.edu

More information

The IEEE Metadata Standard for Supporting Big Data Management

The IEEE Metadata Standard for Supporting Big Data Management The IEEE Metadata Standard for Supporting Big Data Management Alex MH Kuo 1,2 (Ph.D) 1 School of Health Information Science University of Victoria, BC, Canada. 2 CEDAR, School of Medicine University of

More information

An e-infrastructure for Language Documentation on the Web

An e-infrastructure for Language Documentation on the Web An e-infrastructure for Language Documentation on the Web Gary F. Simons, SIL International William D. Lewis, University of Washington Scott Farrar, University of Arizona D. Terence Langendoen, National

More information

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation INSPIRE 2010, KRAKOW Dr. Arif Shaon, Dr. Andrew Woolf (e-science, Science and Technology Facilities Council, UK) 3

More information

The What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal

The What, Why, Who and How of Where: Building a Portal for Geospatial Data. Alan Darnell Director, Scholars Portal The What, Why, Who and How of Where: Building a Portal for Geospatial Data Alan Darnell Director, Scholars Portal What? Scholars GeoPortal Beta release Fall 2011 Production release March 2012 OLITA Award

More information

Google Scholar, Sci-Hub and LibGen: Could they be our New Partners?

Google Scholar, Sci-Hub and LibGen: Could they be our New Partners? Purdue University Purdue e-pubs Proceedings of the IATUL Conferences 2017 IATUL Proceedings Google Scholar, Sci-Hub and LibGen: Could they be our New Partners? Louis Houle McGill University (Canada) Louis

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

A cocktail approach to the VideoCLEF 09 linking task

A cocktail approach to the VideoCLEF 09 linking task A cocktail approach to the VideoCLEF 09 linking task Stephan Raaijmakers Corné Versloot Joost de Wit TNO Information and Communication Technology Delft, The Netherlands {stephan.raaijmakers,corne.versloot,

More information

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web

Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web Creating Large-scale Training and Test Corpora for Extracting Structured Data from the Web Robert Meusel and Heiko Paulheim University of Mannheim, Germany Data and Web Science Group {robert,heiko}@informatik.uni-mannheim.de

More information

The Accolades Project on Large Textual Corpora - Working Paper

The Accolades Project on Large Textual Corpora - Working Paper The Accolades Project on Large Textual Corpora - Working Paper The content of this Working Paper was first presented at the University of Edinburgh 19th - 22nd May 2010 thanks to a network research meeting

More information

3 Publishing Technique

3 Publishing Technique Publishing Tool 32 3 Publishing Technique As discussed in Chapter 2, annotations can be extracted from audio, text, and visual features. The extraction of text features from the audio layer is the approach

More information

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008

Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008 Digital Library Curriculum Development Module 4-b: Metadata Draft: 6 May 2008 1. Module name: Metadata 2. Scope: This module addresses uses of metadata and some specific metadata standards that may be

More information

Final Report. Phase 2. Virtual Regional Dissertation & Thesis Archive. August 31, Texas Center Research Fellows Grant Program

Final Report. Phase 2. Virtual Regional Dissertation & Thesis Archive. August 31, Texas Center Research Fellows Grant Program Final Report Phase 2 Virtual Regional Dissertation & Thesis Archive August 31, 2006 Submitted to: Texas Center Research Fellows Grant Program 2005-2006 Submitted by: Fen Lu, MLS, MS Automated Services,

More information