Extending the Facets concept by applying NLP tools to catalog records of scientific literature

Similar documents
CACAO PROJECT AT THE 2009 TASK

National Documentation Centre Open access in Cultural Heritage digital content

DL User Interfaces. Giuseppe Santucci Dipartimento di Informatica e Sistemistica Università di Roma La Sapienza

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Linked Open Data in Legal Scholarship

Lisa Biagini & Eugenio Picchi, Istituto di Linguistica CNR, Pisa

USER S GUIDE FOR THE ECONOMICS ELECTRONIC LIBRARY

Enhanced retrieval using semantic technologies:

How to contribute information to AGRIS

Scopus. Information literacy in Chemistry. J une 14, 2011

BLLDB User Manual semantics GmbH

Compound or complex object: a set of files with a hierarchical relationship, associated with a single descriptive metadata record.

Welcome to the Spencer Art Reference Library s online catalog, Library OneSearch

Navigating Getty Provenance Resources & Datasets 2016 Pacific Neighborhood Consortium Annual Conference

Cambridge Books Online (CBO)

MECME DIGITAL LIBRARY(Mediterranean Coastal Marine Environment Digital Library)

Text Mining. Representation of Text Documents

Development of an Ontology-Based Portal for Digital Archive Services

clarin:el an infrastructure for documenting, sharing and processing language data

> Semantic Web Use Cases and Case Studies

Performing searches on Érudit

Metadata for Digital Collections: A How-to-Do-It Manual

Organizing Information. Organizing information is at the heart of information science and is important in many other

A Comparative Study of the Search and Retrieval Features of OAI Harvesting Services

Quick Reference 2007/7/6. Springer China Ltd st edition 12/06.

Introduction

Mapping the library future: Subject navigation for today's and tomorrow's library catalogs

TRENTINOMEDIA: Exploiting NLP and Background Knowledge to Browse a Large Multimedia News Store

Tools and resources supporting the cultural tourism

LUISSearch LUISS Institutional Open Access Research Repository

Figure 4. Digital Specialist Browse Page.

Prof. Ahmet Süerdem Istanbul Bilgi University London School of Economics

Share.TEC Repository System

Digital Archives: Extending the 5S model through NESTOR

Scuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS

For each use case, the business need, usage scenario and derived requirements are stated. 1.1 USE CASE 1: EXPLORE AND SEARCH FOR SEMANTIC ASSESTS

METALIS, an OAI Service Provider

Semantic Web and Electronic Information Resources Danica Radovanović

Project GRACE: A grid based search tool for the global digital library

Terminology Management Platform (TMP)

SCOPUS. Scuola di Dottorato di Ricerca in Bioscienze e Biotecnologie. Polo bibliotecario di Scienze, Farmacologia e Scienze Farmaceutiche

Open Research Online The Open University s repository of research publications and other research outputs

Making Sense of Massive Amounts of Scientific Publications: the Scientific Knowledge Miner Project

Developing Hypermedia Over an Information Repository

The Smithsonian/NASA Astrophysics Data System

Access Innovations, Inc.

Revealing the Modern History of Japanese Philosophy Using Digitization, Natural Language Processing, and Visualization

Patent documents usecases with MyIntelliPatent. Alberto Ciaramella IntelliSemantic 25/11/2012

Semantic MediaWiki (SMW) for Scientific Literature Management

Néonaute: mining web archives for linguistic analysis

ECHA -term User Guide

DELOS WP7: Evaluation

The UCD community has made this article openly available. Please share how this access benefits you. Your story matters!

From Open Data to Data- Intensive Science through CERIF

Improving access and facilitating research: The music collections in the new catalogues of the French National Library (BnF)

Information Retrieval (Part 1)

Data is the new Oil (Ann Winblad)

RooDocs Quick Reference Guide

Online Legal Research: Secondary Sources

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Terminologies, Knowledge Organization Systems, Ontologies

Cost. For an explanation of JISC Banding and charging, please go to:

WEB SPAM IDENTIFICATION THROUGH LANGUAGE MODEL ANALYSIS

Future Trends of ILS

Managing Learning Objects in Large Scale Courseware Authoring Studio 1

irnational Standard 5963

Annotating Spatio-Temporal Information in Documents

Eleven+ Views of Semantic Search

Core Technology Development Team Meeting

DALA Project: Digital Archive System for Long Term Access

DRIVER Step One towards a Pan-European Digital Repository Infrastructure

3 Publishing Technique

Ontology-based Navigation of Bibliographic Metadata: Example from the Food, Nutrition and Agriculture Journal

Text mining tools for semantically enriching the scientific literature

DHTK: The Digital Humanities ToolKit

Metadata Standards and Applications

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Data publication and discovery with Globus

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

ISSUES IN INFORMATION RETRIEVAL Brian Vickery. Presentation at ISKO meeting on June 26, 2008 At University College, London

Data Modeling: Beginning and Advanced HDT825 Five Days

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

Overview. Data-mining. Commercial & Scientific Applications. Ongoing Research Activities. From Research to Technology Transfer

I I I I I I I I I I I I I I I I I I I

How to Build a Digital Library

Towards Ease of Building Legos in Assessing ehealth Language Technologies

WIReDSpace (Wits Institutional Repository Environment on DSpace)

TEXT CHAPTER 5. W. Bruce Croft BACKGROUND

Interoperability for Digital Libraries

User Manual Al Manhal. All rights reserved v 3.0

Databases available to ISU researchers:

Enterprise Data Catalog for Microsoft Azure Tutorial

User guide. Created by Ilse A. Rasmussen & Allan Leck Jensen. 27 August You ll find Organic Eprints here:

BIBL NEEDS REVISION INTRODUCTION

Deposit guide for Nottingham eprints

ILIA STATE UNIVERSITY LIBRARY GUIDE. Ilia State University Library

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Writing a good seminar paper Seminar in Software and Service Engineering

DLV02.01 Business processes. Study on functional, technical and semantic interoperability requirements for the Single Digital Gateway implementation

Transcription:

Extending the Facets concept by applying NLP tools to catalog records of scientific literature *E. Picchi, *M. Sassi, **S. Biagioni, **S. Giannini *Institute of Computational Linguistics **Institute of Information Science and Technologies CNR, Italy

National Research Council of Italy - CNR The contest Institute of Computational Linguistics DBTficio Laboratory Models and methods for the natural languages processing, monolingual and multilingual prototype applications Institute of Information Science and Technologies Networked Multimedia Information Systems Laboratory & Library Digital Library management systems

The target The target is to present the prototype of an intelligent navigation system named DBT&Facets, which has been implemented on the full bibliographic records of the documents archived in the PUMA digital library of the Italian National Research Council (CNR) http://puma.isti.cnr.it The system has been implemented by integrating the core textual search engine (known as DBT and developed by ILC) with the TextPower (TP) technology. http://serverdbt.ilc.cnr.it/sitodbt/

PUMA repositories PUMA is a user-focused, service oriented infrastructure which manages an increasing number of CNR institutional repositories containing about 25,000 published or open access documents in a wide variety of disciplines PUMA archives the metadata (qualified DC + administrative metadata) and the full texts of the following document types: Published literature: journal article, conference and workshop paper, book and contribution to book, guest editorial Grey Literature: conference presentation, workshop and meeting paper, communication poster/abstract, pre print, technical report, project report, internal note, PHD thesis, guest editorial, other materials (eg. courses, tutorials etc...)

TP-Text Power Technology TP is based on NLP techniques and linguistic resources used to create tools for the evaluation, analysis, classification and browsing of information related to the domains of scientific literature The extraction of implicit knowledge from the texts through which TP can enrich the documents, is a specialization of the "Facets" technology

The facet concept is peculiar of Archives and Library Science field, but is also used in Information retrieval systems. In Library Science the term "facet" identifies the elements of a structured material such as library catalogs, which are characterized by the code of the field and its contents TP extends the facet concept by extracting field + content pairs not only from structured fields but also from free text, eg. abstracts, using a linguistic-statistical approach to annotate relevant terminology, named entities, etc. The enriched text can be queried, analysed, and classified using DBT&Facets

The prototype DBT&Facets is an advanced search tool that permits the user to query and refine their results, and to identify particular relations between them PUMA Multidisciplinary Repositories Ca. 25000 Records & Abstracts enriched by and elaborated by Intelligent navigation system

The Puma query sample Simple Search for grey

The prototype query sample 1 CATEGORIES A B C D E F Collections Institute Author Language of Summary Language of Document Free Subjects G Year of Publication

The prototype query sample 2 CATEGORIES H I J K L M N O Sub-Type Type Other Language of Summary Edited by Title of Journal Title of Event Title Abstract in english

The graph built on a selected author

Conclusions In an open domain like scientific documentation, our approach based on the criteria of semantic similarity is useful and perhaps more objective than one based on hierarchical elements - as it makes it possible to link different types of information, also across domains if necessary The aim of the project has been to structure a knowledge system of domain-specific information which assists the user by suggesting possible directions for their search