NLP resources: construc.on, standardiza.on, exploita.on & API. Karim Bouzoubaa

Size: px
Start display at page:

Download "NLP resources: construc.on, standardiza.on, exploita.on & API. Karim Bouzoubaa"

Transcription

1 NLP resources: construc.on, standardiza.on, exploita.on & API Karim Bouzoubaa

2 outline Exploita.on NLP resources Construc.on Standardiza.on API

3 Exploita.on

4 Exploitation LRs are used in various NLP so7ware tools: morphological, and analysis of texts spell- checking and paraphrasing search and text mining 4

5 outline Exploita.on NLP resources Construc.on Standardiza.on API

6 NLP Resources

7 Resources Introduction Definition Types Examples Evaluation criteria

8 Introduc.on - Defini.on q The key to NLT development is the Language Resource q Resource produc@on takes a lot of effort and is very expensive Example: The Arabic standard LC- STAR phone@c lexicon of the European Linguis@c Resource Associa@on (ELRA) with 110,271 entries costs EUR (for use in academic research) Language resources are language-related data, accessible in an electronic format, and used for the development of NLP systems 8

9

10 Types 2 categories 1. Corpus writen: monolingual texts, mul@lingual texts, annoted texts, treebanks speech: reading texts aloud, speeches, dialogues, radio and television broadcasts Mul@media: images, sounds and videos 2. Lexicon monolingual and mul@lingual Dic@onaries Gaze@ers (geographical dic@onary) Terminologies ontologies

11 Content of a lexicon An entry in the lexicon may contain : morphological, syntac@c, seman@c and pragma@c informa@on the gramma@cal category (noun, verb, etc.), o subcategory proper@es (transi@ve verb or not, masculine or feminine) seman@c informa@on (animated name, verb requiring a human subject

12 Examples 12

13 Oxford dic.onary

14 verbnet

15 criteria q Formal (regardless of content) Size Maintenance (durability, scalability) Compa@bility q Func.onal (language criteria) Lexicographic annota@on (existence and relevance) Intrinsic rules

16 outline Exploita.on NLP resources Construc.on Standardiza.on API

17 Construc.on

18 Produc.on cycle resources Example (Contempory Arabic) Reusing ressources Example of free resources Good prac.ces Interoperability Viability

19 crea.ng resources two approaches for developing LRs: q creating new resources q tuning existing resources 19

20 crea.ng resources Collect data, of a general nature or belonging to a par@cular sector of ac@vity, directly in digital form or, in some cases, by digi@zing them. 20

21 Example of creating resources Contemporary Arabic

22 Resources Reuse q The opera@on of making changes to a resource for the purpose of performing certain func@ons and improving it in a different usage environment from the original one q Example:... 22

23 Corpus q q q q Lexicon q Example of free resources Corpus of Contemporary Arabic Khoja POS tagged corpus Quranic Arabic Collec@on of free arabic texts and books: - Almeshkat - Al- Eman Buckwalter s list of Arabic roots q Al- Baheth Al- Arabi 23

24 Good In order to contribute to the of a set of sustainable RLs, some principles must be respected: Resource documenta@on Interoperability of resources 24

25 Documenta.on of resources LRs are o7en poorly documented or undocumented at all. should be as comprehensive as possible, and include on: the format of the data the content of the data the context the possible uses 25

26 Resources interoperability q The interoperability of LRs is the ability to operate in different systems q The formats of the LRs must be standard 26

27 Interoperability documentation - reuse Many difficul@es are encountered when reusing available LRs

28 Interoperability documentation - reuse Contribute to the development of LRs respec@ng interoperability rules Availability Portability Reusability normaliza@on

29 outline Exploita.on NLP resources Construc.on Standardiza.on API

30 Standardiza.on

31 why? q How to integrate exis@ng resources into one's own contexts? q How to separate the resources from the tools that manage them?

32 Panorama standardisation agencies: CNIS: China National Institute of Standardization FNOR: Agence Française de Normalisation DIN: Deutsches Institut für Normung ANSI: American National Standards Institute W3C: World Wide Web Consortium TEI: Text Encoding Initiative ISO: the International Organization for Standardization projects: LIRICS :Linguistic Infrastructure for Interoperable Resources and Systems EAGLES: Expert Advisory Group on Language Engineering Standards Multext : Multilingual Text Tools and Corpora research structures: CLARIN: Common Language Resources and Technology Infrastructure FLaReNet : Fostering Language Resources Network Alpage : Analyse Linguistique Profonde A Grande Echelle.

33 Organization

34 standards proposition Préliminaire Proposition Preliminary Work Item (PWI) Préparatoire new project of the WG New Work Item Proposal (NP) Commission Committee Draft (CD) Enquête Approbation Final Draft International Standard (FDIS) Draft International Standard (DIS) Publication International Standard (IS)

35 LMF Modeling Arabic paradigms according to the LMF standard Aïda Khemakhem et al conversion of editorial to LMF Feten Baccar et al. 2008, Aïda Khemakhem et al Domain ontology from LMF Feten Baccar et al Proposed standardized of standard Arabic lexicons Susanne Salmon- Alt et al 2013 of anomalies and of the content of LMF Wafa WALI et al of a system of produc@on of Arabic dic@onaries respec@ng the LMF standard Mohammed Reqqass et al. 2014

36 LMF Example

37 LMF Example

38 TEI <TEI> <teiheader> <name> NAFIS Arabic Stemming Gold Standard</name>... </teiheader> <text> <phr> ععللييككمم ببااللججدد ففا إننهه ا أسسااسس< val > < val />االلننججااحح <"ععللييككمم"= rend w> <choice n="14"> <seg> <m type="prefix"></m> <form type="base"> m> < m />ععلليي<" type="root m> < m />عع لل يي<" type="stem </form> m> < m />ككمم<" type="suffix </seg> <seg> <m type="prefix"></m> <form type="base"> m> < m />ععلليي<" type="root m> < m />ععلل يي <" type="stem </form> m> < m></seg />ككمم<" type="suffix... </choice> </w> </phr>... </text> </TEI>

Towards a roadmap for standardization in language technology

Towards a roadmap for standardization in language technology Towards a roadmap for standardization in language technology Laurent Romary & Nancy Ide Loria-INRIA Vassar College Overview General background on standardization Available standards On-going activities

More information

Background and Context for CLASP. Nancy Ide, Vassar College

Background and Context for CLASP. Nancy Ide, Vassar College Background and Context for CLASP Nancy Ide, Vassar College The Situation Standards efforts have been on-going for over 20 years Interest and activity mainly in Europe in 90 s and early 2000 s Text Encoding

More information

WP7: Patents Case Study

WP7: Patents Case Study MOLTO WP7: Patents Case Study Meritxell Gonzàlez Bermúdez 2nd Year Review Barcelona, March 20th, 2012 Objectives To create a prototype of MT and NL retrieval of patents in the bio- medical & pharmaceu;cal

More information

Object Oriented Design (OOD): The Concept

Object Oriented Design (OOD): The Concept Object Oriented Design (OOD): The Concept Objec,ves To explain how a so8ware design may be represented as a set of interac;ng objects that manage their own state and opera;ons 1 Topics covered Object Oriented

More information

The Multilingual Language Library

The Multilingual Language Library The Multilingual Language Library @ LREC 2012 Let s build it together! Nicoletta Calzolari with Riccardo Del Gratta, Francesca Frontini, Francesco Rubino, Irene Russo Istituto di Linguistica Computazionale

More information

CISC327 - So*ware Quality Assurance

CISC327 - So*ware Quality Assurance CISC327 - So*ware Quality Assurance Lecture 8 Introduc

More information

Implementing a Variety of Linguistic Annotations

Implementing a Variety of Linguistic Annotations Implementing a Variety of Linguistic Annotations through a Common Web-Service Interface Adam Funk, Ian Roberts, Wim Peters University of Sheffield 18 May 2010 Adam Funk, Ian Roberts, Wim Peters Implementing

More information

An approach for generating personalized views from normalized electronic dictionaries : A practical experiment on Arabic language

An approach for generating personalized views from normalized electronic dictionaries : A practical experiment on Arabic language An approach for generating personalized views from normalized electronic dictionaries : A practical experiment on Arabic language Aida Khemakhem MIRACL Laboratory FSEGS, B.P. 1088, 3018 Sfax, Tunisia khemakhem.aida@

More information

LIDER: Building Free, Interlinked, and Interoperable Language Resources. Asunción Gómez- Pérez Philipp Cimiano

LIDER: Building Free, Interlinked, and Interoperable Language Resources. Asunción Gómez- Pérez Philipp Cimiano LIDER: Building Free, Interlinked, and Interoperable Language Resources Asunción Gómez- Pérez Philipp Cimiano MulBlingual Web Workshop Riga, 28th of April. 2015 20/11/2014 Presenter name 1. Surveys 2.

More information

Informa/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields

Informa/on Retrieval. Text Search. CISC437/637, Lecture #23 Ben CartereAe. Consider a database consis/ng of long textual informa/on fields Informa/on Retrieval CISC437/637, Lecture #23 Ben CartereAe Copyright Ben CartereAe 1 Text Search Consider a database consis/ng of long textual informa/on fields News ar/cles, patents, web pages, books,

More information

This document is a preview generated by EVS

This document is a preview generated by EVS INTERNATIONAL STANDARD ISO 24611 First edition 2012-11-01 Language resource management Morpho-syntactic annotation framework (MAF) Gestion des ressources langagières Cadre d'annotation morphosyntaxique

More information

Ontology engineering. Valen.na Tamma. Based on slides by A. Gomez Perez, N. Noy, D. McGuinness, E. Kendal, A. Rector and O. Corcho

Ontology engineering. Valen.na Tamma. Based on slides by A. Gomez Perez, N. Noy, D. McGuinness, E. Kendal, A. Rector and O. Corcho Ontology engineering Valen.na Tamma Based on slides by A. Gomez Perez, N. Noy, D. McGuinness, E. Kendal, A. Rector and O. Corcho Summary Background on ontology; Ontology and ontological commitment; Logic

More information

UIMA-based Annotation Type System for a Text Mining Architecture

UIMA-based Annotation Type System for a Text Mining Architecture UIMA-based Annotation Type System for a Text Mining Architecture Udo Hahn, Ekaterina Buyko, Katrin Tomanek, Scott Piao, Yoshimasa Tsuruoka, John McNaught, Sophia Ananiadou Jena University Language and

More information

Modeling Dialectal Dic.onaries for their Publica.on in the Linked Data. Thierry Declerck DFKI GmbH, Language Technology Lab, Germany;

Modeling Dialectal Dic.onaries for their Publica.on in the Linked Data. Thierry Declerck DFKI GmbH, Language Technology Lab, Germany; Modeling Dialectal Dic.onaries for their Publica.on in the Linked Data Thierry Declerck DFKI GmbH, Language Technology Lab, Germany; declerck@dci.de Overview The dialectal dic.onaries we are dealing with

More information

System Modeling Environment

System Modeling Environment System Modeling Environment Requirements, Architecture and Implementa

More information

AVT Odyssey: Voyage to the Future

AVT Odyssey: Voyage to the Future AVT Odyssey: Voyage to the Future Anna Matamala Universitat Autònoma de Barcelona TransMedia Catalonia research group anna.matamala@uab.cat Intermedia, 14-16 April 2016 FFI2015-62522-ERC, 2014SGR0027,

More information

Annotation by category - ELAN and ISO DCR

Annotation by category - ELAN and ISO DCR Annotation by category - ELAN and ISO DCR Han Sloetjes, Peter Wittenburg Max Planck Institute for Psycholinguistics P.O. Box 310, 6500 AH Nijmegen, The Netherlands E-mail: Han.Sloetjes@mpi.nl, Peter.Wittenburg@mpi.nl

More information

Formats and standards for metadata, coding and tagging. Paul Meurer

Formats and standards for metadata, coding and tagging. Paul Meurer Formats and standards for metadata, coding and tagging Paul Meurer The FAIR principles FAIR principles for resources (data and metadata): Findable (-> persistent identifier, metadata, registered/indexed)

More information

Preliminary ACTL-SLOW Design in the ACS and OPC-UA context. G. Tos? (19/04/2016)

Preliminary ACTL-SLOW Design in the ACS and OPC-UA context. G. Tos? (19/04/2016) Preliminary ACTL-SLOW Design in the ACS and OPC-UA context G. Tos? (19/04/2016) Summary General Introduc?on to ACS Preliminary ACTL-SLOW proposed design Hardware device integra?on in ACS and ACTL- SLOW

More information

Corpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002

Corpus methods for sociolinguistics. Emily M. Bender NWAV 31 - October 10, 2002 Corpus methods for sociolinguistics Emily M. Bender bender@csli.stanford.edu NWAV 31 - October 10, 2002 Overview Introduction Corpora of interest Software for accessing and analyzing corpora (demo) Basic

More information

clarin:el an infrastructure for documenting, sharing and processing language data

clarin:el an infrastructure for documenting, sharing and processing language data clarin:el an infrastructure for documenting, sharing and processing language data Stelios Piperidis, Penny Labropoulou, Maria Gavrilidou (Athena RC / ILSP) the problem 19/9/2015 ICGL12, FU-Berlin 2 use

More information

Language Resources. Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F Paris, France Tel Fax.

Language Resources. Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F Paris, France Tel Fax. Language Resources By the Other Data Center over 15 years fruitful partnership Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F-75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 choukri@elda.org

More information

Alignment and Image Comparison

Alignment and Image Comparison Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison

More information

Visualizing Logical Dependencies in SWRL Rule Bases

Visualizing Logical Dependencies in SWRL Rule Bases Visualizing Logical Dependencies in SWRL Rule Bases Saeed Hassanpour, Mar:n J. O Connor and Amar K. Das Stanford Center for Biomedical Informa:cs Research MSOB X215, 251 Campus Drive, Stanford, California,

More information

Research resources and standardization : in the digital age

Research resources and standardization : in the digital age Research resources and standardization : in the digital age Akira MIYAZAWA Prof. emeritus NII 2018-09-13 2018 EAJRS 1 books documents artworks statistical data experiment data journal articles Research

More information

Related Course Objec6ves

Related Course Objec6ves Syntax 9/18/17 1 Related Course Objec6ves Develop grammars and parsers of programming languages 9/18/17 2 Syntax And Seman6cs Programming language syntax: how programs look, their form and structure Syntax

More information

Configura)on Management Founda)ons. Leonardo Gresta Paulino Murta

Configura)on Management Founda)ons. Leonardo Gresta Paulino Murta Configura)on Management Founda)ons Leonardo Gresta Paulino Murta leomurta@ic.uff.br Configura)on Item Hardware or so@ware aggrega)on subject to configura)on management Examples: CM plan Requirement Engineering

More information

Best Prac*ces in Accessibility and Universal Design for Learning. Rozy Parlette, Instruc*onal Designer Center for Instruc*on and Research Technology

Best Prac*ces in Accessibility and Universal Design for Learning. Rozy Parlette, Instruc*onal Designer Center for Instruc*on and Research Technology Best Prac*ces in Accessibility and Universal Design for Learning Rozy Parlette, Instruc*onal Designer Center for Instruc*on and Research Technology Purpose The purpose of this session is to iden*fy best

More information

TEI metadata as source to Europeana Regia prac5cal example and future challenges. Stefanie Gehrke

TEI metadata as source to Europeana Regia prac5cal example and future challenges. Stefanie Gehrke TEI metadata as source to Europeana Regia prac5cal example and future challenges Stefanie Gehrke Content Mo/va/on Reference transforma/on Technical details TEI as a source Seman/c approach Conclusion TEI

More information

W3C Interna+onaliza+on Tag Set 2.0 Usage Scenarios and Implementa+ons

W3C Interna+onaliza+on Tag Set 2.0 Usage Scenarios and Implementa+ons W3C Interna+onaliza+on Tag Set 2.0 Usage Scenarios and Implementa+ons Felix Sasaki (W3C, DFKI), Chris(an Lieske (SAP AG) Authors Prof. Dr. Felix Sasaki DFKI/FH Potsdam/W3C n Appointed to Prof. in 2009;

More information

CDISC Migra+on. PhUSE 2010 Berlin. 47 of the top 50 biopharmaceu+cal firms use Cytel sofware to design, simulate and analyze their clinical studies.

CDISC Migra+on. PhUSE 2010 Berlin. 47 of the top 50 biopharmaceu+cal firms use Cytel sofware to design, simulate and analyze their clinical studies. CDISC Migra+on PhUSE 2010 Berlin 47 of the top 50 biopharmaceu+cal firms use Cytel sofware to design, simulate and analyze their clinical studies. Source: The Pharm Exec 50 the world s top 50 pharmaceutical

More information

Information Standards Quarterly

Information Standards Quarterly CORE (Cost of Reso ISO 25964-1 Z39.7 Data Dictionary Standing C ISO/ TR 11219 ISO/TR 14873 ISO 5127 RFID in Libraries article ecerpted from: SERU (Shared E-Resource Understanding) ISO 8 Information Standards

More information

Standards for Language Resources

Standards for Language Resources Standards for Language Resources Nancy Ide,* Laurent Romary * Department of Computer Science Vassar College Poughkeepsie, New York 12604-0520 USA ide@cs.vassar.edu Equipe Langue et Dialogue LORIA/INRIA

More information

Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)

Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core) INTERNATIONAL STANDARD ISO 24617-8 First edition 2016-12-15 Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)

More information

(Some) Standards in the Humanities. Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014

(Some) Standards in the Humanities. Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014 (Some) Standards in the Humanities Sebastian Drude CLARIN ERIC RDA 4 th Plenary, Amsterdam September 2014 1. Introduction Overview 2. Written text: the Text Encoding Initiative (TEI) 3. Multimodal: ELAN

More information

Text mining workflows for indexing archives with automa7cally extracted seman7c metadata

Text mining workflows for indexing archives with automa7cally extracted seman7c metadata 1 Text mining workflows for indexing archives with automa7cally extracted seman7c metadata Riza Ba'sta- Navarro 1, Axel Soto 1, William Ulate 2 and Sophia Ananiadou 1 1 University of Manchester 2 Missouri

More information

Ortolang Tools : MarsaTag

Ortolang Tools : MarsaTag Ortolang Tools : MarsaTag Stéphane Rauzy, Philippe Blache, Grégoire de Montcheuil SECOND VARIAMU WORKSHOP LPL, Aix-en-Provence August 20th & 21st, 2014 ORTOLANG received a State aid under the «Investissements

More information

Open Language Resources & Meta-Resources: a Treasure and a Challenge for Linked Data

Open Language Resources & Meta-Resources: a Treasure and a Challenge for Linked Data Open Language Resources & Meta-Resources: a Treasure and a Challenge for Linked Data The challenges of openness, interoperability, collaboration, Nicoletta Calzolari ILC CNR & ELRA glottolo@ilc.cnr.it

More information

XML Support for Annotated Language Resources

XML Support for Annotated Language Resources XML Support for Annotated Language Resources Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York USA ide@cs.vassar.edu Laurent Romary Equipe Langue et Dialogue LORIA/CNRS Vandoeuvre-lès-Nancy,

More information

A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality

A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality A System of Exploiting and Building Homogeneous and Large Resources for the Improvement of Vietnamese-Related Machine Translation Quality Huỳnh Công Pháp 1 and Nguyễn Văn Bình 2 The University of Danang

More information

ISLE Metadata Initiative (IMDI) PART 1 B. Metadata Elements for Catalogue Descriptions

ISLE Metadata Initiative (IMDI) PART 1 B. Metadata Elements for Catalogue Descriptions ISLE Metadata Initiative (IMDI) PART 1 B Metadata Elements for Catalogue Descriptions Version 3.0.13 August 2009 INDEX 1 INTRODUCTION...3 2 CATALOGUE ELEMENTS OVERVIEW...4 3 METADATA ELEMENT DEFINITIONS...6

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 10! Con:nua:on of the course Syntax- Directed Transla:on

More information

Interfacing with Services. Jukka K. Nurminen

Interfacing with Services. Jukka K. Nurminen Interfacing with Services Jukka K. Nurminen 29.1.2013 Prac%cali%es I hope everybody has sent an assignment signup message to the course mailing list Assignments have been published GIT training GIT Lecture

More information

Building the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format

Building the Multilingual Web of Data. Integrating NLP with Linked Data and RDF using the NLP Interchange Format Building the Multilingual Web of Data Integrating NLP with Linked Data and RDF using the NLP Interchange Format Presenter name 1 Outline 1. Introduction 2. NIF Basics 3. NIF corpora 4. NIF tools & services

More information

Modeling LMF compliant lexica in OWL-DL

Modeling LMF compliant lexica in OWL-DL 19 21 June 11th International conference DIN Deutsches Institut für Normung e. V. Modeling LMF compliant lexica in OWL-DL Malek Lhioui 1, Kais Haddar 1 and Laurent Romary 2 1 : Multimedia, InfoRmation

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Construc9on Sort- based indexing Blocked Sort- Based Indexing

More information

Ontology Design Pa/ern-driven Linked Data Publishing

Ontology Design Pa/ern-driven Linked Data Publishing Ontology Design Pa/ern-driven Linked Data Publishing Adila Krisnadhi Data Seman1cs Lab (a.k.a. DaSeLab) Wright State University, Dayton, OH E-mail: krisnadhi@gmail.com GitHub: krisnadhi 2016 ESIP Summer

More information

Maca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology

Maca a configurable tool to integrate Polish morphological data. Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Maca a configurable tool to integrate Polish morphological data Adam Radziszewski Tomasz Śniatowski Wrocław University of Technology Outline Morphological resources for Polish Tagset and segmentation differences

More information

The Vampire Theorem Prover. Krystof Hoder Andrei Voronkov

The Vampire Theorem Prover. Krystof Hoder Andrei Voronkov The Vampire Theorem Prover Krystof Hoder Andrei Voronkov Automated First- Order Automated we do not rely on user interac@on can be used a black- box by other tools Theorem Proving Automated First- Order

More information

Informa(on Retrieval

Informa(on Retrieval Introduc)on to Informa)on Retrieval CS3245 Informa(on Retrieval Lecture 7: Scoring, Term Weigh9ng and the Vector Space Model 7 Last Time: Index Compression Collec9on and vocabulary sta9s9cs: Heaps and

More information

Annotation Science From Theory to Practice and Use Introduction A bit of history

Annotation Science From Theory to Practice and Use Introduction A bit of history Annotation Science From Theory to Practice and Use Nancy Ide Department of Computer Science Vassar College Poughkeepsie, New York 12604 USA ide@cs.vassar.edu Introduction Linguistically-annotated corpora

More information

Linked Open Data Cloud. John P. McCrae, Thierry Declerck

Linked Open Data Cloud. John P. McCrae, Thierry Declerck Linked Open Data Cloud John P. McCrae, Thierry Declerck Hitchhiker s guide to the Linked Open Data Cloud DBpedia Largest node in the linked open data cloud Nucleus for a web of open data Most data is

More information

Final Project Discussion. Adam Meyers Montclair State University

Final Project Discussion. Adam Meyers Montclair State University Final Project Discussion Adam Meyers Montclair State University Summary Project Timeline Project Format Details/Examples for Different Project Types Linguistic Resource Projects: Annotation, Lexicons,...

More information

Seman+c Web Ontology Design

Seman+c Web Ontology Design Seman+c Web Ontology Design Valen+na Presu< and Eva Blomqvist Lecture 3 @ Corso DoForato 2011 Dipar+mento di Scienze dell Informazione Bologna, Italy Computa+onal ontologies Ontologies as (sopware) components,

More information

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson

Search Engines. Informa1on Retrieval in Prac1ce. Annotations by Michael L. Nelson Search Engines Informa1on Retrieval in Prac1ce Annotations by Michael L. Nelson All slides Addison Wesley, 2008 Indexes Indexes are data structures designed to make search faster Text search has unique

More information

Standards for language encoding: Sharing resources

Standards for language encoding: Sharing resources Standards for language encoding: Sharing resources Tomaž Erjavec Dept. of Knowledge Technologies Jožef Stefan Institute ESSLLI 2011 Sharing language resources Copyright Making information about resources

More information

Compila(on /15a Lecture 6. Seman(c Analysis Noam Rinetzky

Compila(on /15a Lecture 6. Seman(c Analysis Noam Rinetzky Compila(on 0368-3133 2014/15a Lecture 6 Seman(c Analysis Noam Rinetzky 1 You are here Source text txt Process text input characters Lexical Analysis tokens Annotated AST Syntax Analysis AST Seman(c Analysis

More information

Linked Data and Language Technologies: The LIDER project

Linked Data and Language Technologies: The LIDER project Linked Data and Language Technologies: The LIDER project A. Gómez- Pérez (UPM) asun@fi.upm.es Project Coordinator CSA Budget: 1.482.000 Starting date: 1. Nov. 2013 Duration: 2 Years 163 PM 2014.05.08 Presenter

More information

Initial Operating Capability & The INSPIRE Community Geoportal

Initial Operating Capability & The INSPIRE Community Geoportal INSPIRE Conference, Rotterdam, 15 19 June 2009 1 Infrastructure for Spatial Information in the European Community Initial Operating Capability & The INSPIRE Community Geoportal EC INSPIRE GEOPORTAL TEAM

More information

Corpus Linguistics: corpus annotation

Corpus Linguistics: corpus annotation Corpus Linguistics: corpus annotation Karën Fort karen.fort@inist.fr November 30, 2010 Introduction Methodology Annotation Issues Annotation Formats From Formats to Schemes Sources Most of this course

More information

Translating and the Computer 28 Conference 2006

Translating and the Computer 28 Conference 2006 Translating and the Computer 28 Conference 2006 Integrated bilingual specialist dictionaries : The LexTerm initiative Marie-Jeanne Derouin, Langenscheidt Fachverlag, Munich, Germany André Le Meur, Université

More information

Common Criteria Crypto Working Group. Interna'onal Cryptographic Module Conference 2017 Fritz Bollmann (BSI) Mary Baish (NIAP)

Common Criteria Crypto Working Group. Interna'onal Cryptographic Module Conference 2017 Fritz Bollmann (BSI) Mary Baish (NIAP) Common Criteria Crypto Working Group Interna'onal Cryptographic Module Conference 2017 Fritz Bollmann (BSI) Mary Baish (NIAP) Crypto in Common Criteria Cryptography is ubiquitous in Common Criteria Protec'on

More information

LEARNING OBJECT METADATA IN A WEB-BASED LEARNING ENVIRONMENT

LEARNING OBJECT METADATA IN A WEB-BASED LEARNING ENVIRONMENT LEARNING OBJECT METADATA IN A WEB-BASED LEARNING ENVIRONMENT Paris Avgeriou, Anastasios Koutoumanos, Symeon Retalis, Nikolaos Papaspyrou {pavger, tkout, retal, nickie}@softlab.ntua.gr National Technical

More information

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases LIDER Survey Overview Participant profile (organisation type, industry sector) Relevant use-cases Discovering and extracting information Understanding opinion Content and data (Data Management) Monitoring

More information

ISO/IEC/Web3D Status Report

ISO/IEC/Web3D Status Report January 22, 2019 ISO/IEC/Web3D Status Report Dr. Richard F. Puk President, Intelligraphics Incorporated Convener, ISO/IEC JTC 1/SC 24/WG 6 ISO/IEC JTC1/SC24 Liaison to Web3D Consortium Web3D-related Standards

More information

About the Course. Reading List. Assignments and Examina5on

About the Course. Reading List. Assignments and Examina5on Uppsala University Department of Linguis5cs and Philology About the Course Introduc5on to machine learning Focus on methods used in NLP Decision trees and nearest neighbor methods Linear models for classifica5on

More information

An ontology of resources for Linked Data

An ontology of resources for Linked Data An ontology of resources for Linked Data Harry Halpin and Valen8na Presu: LDOW @ WWW2009 Madrid, April 20th Outline Premises and background Proposal overview Some details of IRW ontology Simple applica8on

More information

Adapting to Climate Change Contribution for ICT infrastructure

Adapting to Climate Change Contribution for ICT infrastructure Adapting to Climate Change Contribution for ICT infrastructure Dipl.-Ing. (Univ.) Thomas H. Wegmann International Standardization Manager DKE Deutsche Kommission Elektrotechnik Elektronik Informationstechnik

More information

Text Mining. Sophia Ananiadou Na:onal Centre for Text Mining

Text Mining. Sophia Ananiadou Na:onal Centre for Text Mining Text Mining Sophia Ananiadou Sophia.Ananiadou@manchester.ac.uk Na:onal Centre for Text Mining www.nactem.ac.uk NaCTeM- www.nactem.ac.uk q The 1 st publicly funded national text mining centre in the world

More information

Mul$media Techniques in Android. Some of the informa$on in this sec$on is adapted from WiseAndroid.com

Mul$media Techniques in Android. Some of the informa$on in this sec$on is adapted from WiseAndroid.com Mul$media Techniques in Android Some of the informa$on in this sec$on is adapted from WiseAndroid.com Mul$media Support Android provides comprehensive mul$media func$onality: Audio: all standard formats

More information

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0

RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 software development simplified RAD, Rules, and Compatibility: What's Coming in Kuali Rice 2.0 Eric Westfall - Indiana University JASIG 2011 For those who don t know Kuali Rice consists of mul8ple sub-

More information

CORLI. a linguistic consortium for corpus, language and interaction

CORLI. a linguistic consortium for corpus, language and interaction CORLI a linguistic consortium for corpus, language and interaction CORLI and HUMA-NUM CORLI = Corpus, Languages, and Interaction a French consortium of Huma-Num involved in linguistic research and teaching

More information

10/7/15. MediaItem tostring Method. Objec,ves. Using booleans in if statements. Review. Javadoc Guidelines

10/7/15. MediaItem tostring Method. Objec,ves. Using booleans in if statements. Review. Javadoc Guidelines Objec,ves Excep,ons Ø Wrap up Files Streams MediaItem tostring Method public String tostring() { String classname = getclass().tostring(); StringBuilder rep = new StringBuilder(classname); return rep.tostring();

More information

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart Design and Realization of the EXCITEMENT Open Platform for Textual Entailment Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart Textual Entailment Textual Entailment (TE) A Text (T) entails a

More information

Best Practices for Termbase Design

Best Practices for Termbase Design Best Practices for Termbase Design Klaus Dirk Schmitz Institute for Translation and Multilingual Communication Technische Hochschule Köln TH Köln Germany klaus.schmitz@th koeln.de Klaus Dirk Schmitz Diploma

More information

Intelligent Systems Knowledge Representa6on

Intelligent Systems Knowledge Representa6on Intelligent Systems Knowledge Representa6on SCJ3553 Ar6ficial Intelligence Faculty of Computer Science and Informa6on Systems Universi6 Teknologi Malaysia Outline Introduc6on Seman6c Network Frame Conceptual

More information

PDF hosted at the Radboud Repository of the Radboud University Nijmegen

PDF hosted at the Radboud Repository of the Radboud University Nijmegen PDF hosted at the Radboud Repository of the Radboud University Nijmegen The following full text is a publisher's version. For additional information about this publication click this link. http://hdl.handle.net/2066/40896

More information

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet

A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet A Method for Semi-Automatic Ontology Acquisition from a Corporate Intranet Joerg-Uwe Kietz, Alexander Maedche, Raphael Volz Swisslife Information Systems Research Lab, Zuerich, Switzerland fkietz, volzg@swisslife.ch

More information

Vulnerability Analysis (III): Sta8c Analysis

Vulnerability Analysis (III): Sta8c Analysis Computer Security Course. Vulnerability Analysis (III): Sta8c Analysis Slide credit: Vijay D Silva 1 Efficiency of Symbolic Execu8on 2 A Sta8c Analysis Analogy 3 Syntac8c Analysis 4 Seman8cs- Based Analysis

More information

Knowledge Engineering with Semantic Web Technologies

Knowledge Engineering with Semantic Web Technologies This file is licensed under the Creative Commons Attribution-NonCommercial 3.0 (CC BY-NC 3.0) Knowledge Engineering with Semantic Web Technologies Lecture 5: Ontological Engineering 5.3 Ontology Learning

More information

The challenge of collecting and evaluating LRs for commercial use

The challenge of collecting and evaluating LRs for commercial use Language Technologies Observatory The challenge of collecting and evaluating LRs for commercial use www.lt-observatory.eu Bente Maegaard, CLARIN ERIC (and University of Copenhagen) Overview of the challenges

More information

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa Computazionale, INTERNET and DBT Abstract The advent of Internet has had enormous impact on working patterns and development in many scientific

More information

LIDER: FP Linked Data as an enabler of cross-media and multilingual. analytics for enterprises across Europe. Phase II

LIDER: FP Linked Data as an enabler of cross-media and multilingual. analytics for enterprises across Europe. Phase II LIDER: FP7 610782 Linked Data as an enabler of cross-media and multilingual content analytics for enterprises across Europe Deliverable number Deliverable title Main Authors D4.4.3 Updated Project Fact

More information

Unit 3 Corpus markup

Unit 3 Corpus markup Unit 3 Corpus markup 3.1 Introduction Data collected using a sampling frame as discussed in unit 2 forms a raw corpus. Yet such data typically needs to be processed before use. For example, spoken data

More information

What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012

What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012 What makes an applica/on a good applica/on? How is so'ware experienced by end- users? Chris7an Campo EclipseCon 2012 Who are we? Chris/an Campo How is so:ware experienced by end- users? What is Usability?

More information

Alignment and Image Comparison. Erik Learned- Miller University of Massachuse>s, Amherst

Alignment and Image Comparison. Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison Erik Learned- Miller University of Massachuse>s, Amherst Alignment and Image Comparison

More information

COLDIC, a Lexicographic Platform for LMF Compliant Lexica

COLDIC, a Lexicographic Platform for LMF Compliant Lexica COLDIC, a Lexicographic Platform for LMF Compliant Lexica Núria Bel, Sergio Espeja, Montserrat Marimon, Marta Villegas Institut Universitari de Lingüística Aplicada Universitat Pompeu Fabra Pl. de la Mercè,

More information

Introduction of ISO/IEC JTC1 SC 38 & its standard work on cloud computing. Junfeng ZHAO

Introduction of ISO/IEC JTC1 SC 38 & its standard work on cloud computing. Junfeng ZHAO Introduction of ISO/IEC JTC1 SC 38 & its standard work on cloud computing Junfeng ZHAO 2011.3.23 Agenda Introduction of ISO/IEC JTC1 /SC 38 Introduction of ISO/IEC JTC1 /SC 38 SG1 Introduction of On-going

More information

Architectural Requirements Phase. See Sommerville Chapters 11, 12, 13, 14, 18.2

Architectural Requirements Phase. See Sommerville Chapters 11, 12, 13, 14, 18.2 Architectural Requirements Phase See Sommerville Chapters 11, 12, 13, 14, 18.2 1 Architectural Requirements Phase So7ware requirements concerned construc>on of a logical model Architectural requirements

More information

The Web Enabling Company

The Web Enabling Company The Web Enabling Company Integrating Linguistic Products into Corporate Applications Elisabeth Maier Canoo Engineering AG Basel-Switzerland elisabeth.maier@canoo.com www.canoo.com, www.canoo.net Page 1

More information

Modeling and transforming a multilingual technical lexicon for conservation-restoration using XML

Modeling and transforming a multilingual technical lexicon for conservation-restoration using XML Modeling and transforming a multilingual technical lexicon for conservation-restoration using XML Alice Lonati 1, Violetta Lonati 2, and Massimo Santini 2 1 Associazione Giovanni Secco Suardo Lurano, Italy

More information

REDCap Data Dic+onary

REDCap Data Dic+onary REDCap Data Dic+onary ITHS Biomedical Informa+cs Core iths_redcap_admin@uw.edu Bas de Veer MS Research Consultant REDCap version: 6.2.1 Last updated December 9, 2014 1 Goals & Agenda Goals CraDing your

More information

Extending the Facets concept by applying NLP tools to catalog records of scientific literature

Extending the Facets concept by applying NLP tools to catalog records of scientific literature Extending the Facets concept by applying NLP tools to catalog records of scientific literature *E. Picchi, *M. Sassi, **S. Biagioni, **S. Giannini *Institute of Computational Linguistics **Institute of

More information

Converting and Representing Social Media Corpora into TEI: Schema and Best Practices from CLARIN-D

Converting and Representing Social Media Corpora into TEI: Schema and Best Practices from CLARIN-D Converting and Representing Social Media Corpora into TEI: Schema and Best Practices from CLARIN-D Michael Beißwenger, Eric Ehrhardt, Axel Herold, Harald Lüngen, Angelika Storrer Background of this talk:

More information

For more information about how to cite these materials visit

For more information about how to cite these materials visit Author(s): Jeremy York, 2010 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution Noncommercial Share Alike 3.0 License: http://creativecommons.org/licenses/by-nc-sa/3.0/

More information

ISO/ CEN Standardization Status European Citizen Card Lorenzo Gaston

ISO/ CEN Standardization Status European Citizen Card Lorenzo Gaston ISO/ CEN Standardization Status European Citizen Card Lorenzo Gaston ETSI 16 th -17 th Jan 2007 CEN/ TC224 WG15 European Citizen Card Standardization of ID cards for Public Administration, including but

More information

Principles of Programming Languages

Principles of Programming Languages Principles of Programming Languages h"p://www.di.unipi.it/~andrea/dida2ca/plp- 14/ Prof. Andrea Corradini Department of Computer Science, Pisa Lesson 11! Syntax- Directed Transla>on The Structure of the

More information

Standards for language encoding: ISO

Standards for language encoding: ISO Standards for language encoding: ISO Tomaž Erjavec Dept. of Knowledge Technologies Jožef Stefan Institute ESSLLI 2011 Overview of the lecture 1. How ISO works 2. ISO TC 37 3. Dates, times & languages 4.

More information

What were his cri+cisms? Classical Methodologies:

What were his cri+cisms? Classical Methodologies: 1 2 Classifica+on In this scheme there are several methodologies, such as Process- oriented, Blended, Object Oriented, Rapid development, People oriented and Organisa+onal oriented. According to David

More information