TEXT MINING: THE NEXT DATA FRONTIER

Similar documents
Enabling Open Science: Data Discoverability, Access and Use. Jo McEntyre Head of Literature Services

European Cloud Initiative: implementation status. Augusto BURGUEÑO ARJONA European Commission DG CNECT Unit C1: e-infrastructure and Science Cloud

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

OpenAIRE. Fostering the social and technical links that enable Open Science in Europe and beyond

National Centre for Text Mining NaCTeM. e-science and data mining workshop

I data set della ricerca ed il progetto EUDAT

Coupled Computing and Data Analytics to support Science EGI Viewpoint Yannick Legré, EGI.eu Director

DOIs for Research Data

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Digital repositories as research infrastructure: a UK perspective

Indiana University Research Technology and the Research Data Alliance

Data Management Plans. Sarah Jones Digital Curation Centre, Glasgow

Inge Van Nieuwerburgh OpenAIRE NOAD Belgium. Tools&Services. OpenAIRE EUDAT. can be reused under the CC BY license

Cheshire 3 Framework White Paper: Implementing Support for Digital Repositories in a Data Grid Environment

Mercè Crosas, Ph.D. Chief Data Science and Technology Officer Institute for Quantitative Social Science (IQSS) Harvard

Reproducibility and FAIR Data in the Earth and Space Sciences

EGI federated e-infrastructure, a building block for the Open Science Commons

Some Big Data Challenges

Applying Auto-Data Classification Techniques for Large Data Sets

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

Helix Nebula, the Science Cloud

GLOBAL INFRASTRUCTURES FOR SUPPORTING BIODIVERSITY RESEARCH

Platform UI Specification

CORE: Improving access and enabling re-use of open access content using aggregations

Semantic MediaWiki (SMW) for Scientific Literature Management

Platform UI Specification (26)

For Attribution: Developing Data Attribution and Citation Practices and Standards

Web of Science. Platform Release Nina Chang Product Release Date: March 25, 2018 EXTERNAL RELEASE DOCUMENTATION

Interoperability Standards and Specifications

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

EUDAT & SeaDataCloud

OpenAIRE Open Knowledge Infrastructure for Europe

OpenAIRE From Pilot to Service

EUDAT - Open Data Services for Research

Research Elsevier

Software + Services for Data Storage, Management, Discovery, and Re-Use

The Materials Data Facility

Empowering People with Knowledge the Next Frontier for Web Search. Wei-Ying Ma Assistant Managing Director Microsoft Research Asia

Open-Source Natural Language Processing and Computational Archival Science

Data Discovery - Introduction

Medici for Digital Cultural Heritage Libraries. George Tsouloupas, PhD The LinkSCEEM Project

National Materials Data Initiatives

CLARIN s central infrastructure. Dieter Van Uytvanck CLARIN-PLUS Tools & Services Workshop 2 June 2016 Vienna

re3data.org - Making research data repositories visible and discoverable

Putting Open Access into Practice

Platform UI Specification (20)

Welcome to the Pure International Conference. Jill Lindmeier HR, Brand and Event Manager Oct 31, 2018

ELIXIR Compute platform

DT-ICT : Big data solutions for energy

Why CERIF? Keith G Jeffery Scientific Coordinator ERCIM Anne Assserson eurocris. Keith G Jeffery SDSVoc Workshop Amsterdam

Bringing Europeana and CLARIN together: Dissemination and exploitation of cultural heritage data in a research infrastructure

The OpenAIREplus Project

Science Europe Consultation on Research Data Management

Global Data Sharing The Research Data Alliance

Interoperability Standards and Specification

Big Data Value cppp Big Data Value Association Big Data Value ecosystem

Globus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018

> Semantic Web Use Cases and Case Studies

ICME: Status & Perspectives

FREYA Connected Open Identifiers for Discovery, Access and Use of Research Resources

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Metadata Ingestion and Processinng

Make the most of your access to ScienceDirect

The iplant Data Commons

DataONE: Open Persistent Access to Earth Observational Data

EUDAT. A European Collaborative Data Infrastructure. Daan Broeder The Language Archive MPI for Psycholinguistics CLARIN, DASISH, EUDAT

EUDAT. Towards a pan-european Collaborative Data Infrastructure

Informatica Enterprise Information Catalog

The ELIXIR of Linked Data

CANARIE Mandate Renewal Proposal

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Progress towards the EOSC

Setting up a CIDOC CRM Adoption and Use Strategy CIDOC CRM: Success Stories, Challenges and New Perspective

Tools for Data Management. Research Data Management : Session 3 9 th June 2015

Big Data infrastructure and tools in libraries

Web of Science. Platform Release Nina Chang Product Release Date: December 10, 2017 EXTERNAL RELEASE DOCUMENTATION

Giovanni Lamanna LAPP - Laboratoire d'annecy-le-vieux de Physique des Particules, Université de Savoie, CNRS/IN2P3, Annecy-le-Vieux, France

Data Replication: Automated move and copy of data. PRACE Advanced Training Course on Data Staging and Data Movement Helsinki, September 10 th 2013

Open Research Online The Open University s repository of research publications and other research outputs

CREATING SMART TRANSPORT SERVICES BY FACILITATING THE RE-USE OF OPEN GIS DATA

Open Science, FAIR data and effective data management

Customising Location of Knowledge. Ann Apps and Ross MacIntyre MIMAS, The University of Manchester, UK

NSF gateway to Scientific literature

Regional Information Centre for Scientific and Technological Cooperation with EU, Voronezh State University 1-2/07/2010, Voronezh

ESA EO Programmes for CM16. EOEP-5 Block 4. Bilateral meeting with AT Delegation and Industry Vienna, 24/05/2016. ESA UNCLASSIFIED - For Official Use

Data Management Checklist

EUDAT. Towards a pan-european Collaborative Data Infrastructure - A Nordic Perspective? -

Long-term preservation for INSPIRE: a metadata framework and geo-portal implementation

EUDAT Data Services & Tools for Researchers and Communities. Dr. Per Öster Director, Research Infrastructures CSC IT Center for Science Ltd

Edinburgh DataShare: Tackling research data in a DSpace institutional repository

Launching the. Data Curation Network NDS/MBDH 2018

Paving the Rocky Road Toward Open and FAIR in the Field Sciences

Data publication and discovery with Globus

OpenAIRE From Pilot to Service The Open Knowledge Infrastructure for Europe

Historical Text Mining:

N. Marusov, I. Semenov

THE ENVIRONMENTAL OBSERVATION WEB AND ITS SERVICE APPLICATIONS WITHIN THE FUTURE INTERNET Project introduction and technical foundations (I)

ACCI Recommendations on Long Term Cyberinfrastructure Issues: Building Future Development

Helix Nebula The Science Cloud

21ST century enterprise. HCL Technologies Presents. Roadmap for Data Center Transformation

Transcription:

TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom

2 OpenMinTeD Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific and scholarly related sources.

beyond Open Access MAKING SENSE OF LARGE VOLUMES OF SCIENTIFIC CONTENT 3

OPENMINTED -The Open Mining Infrastructure for Text and Data The phases of text mining STAGE 1 STAGE 2 STAGE 3 STAGE 4 Information Retrieval NLP Analysis Entity Recognition Information Extraction Data Mining Knowledge Discovery

OPENMINTED - The Open Mining Infrastructure for Text and Data TDM challenges for researchers 1. Content challenges - Barriers and obstacles due to non-availability, technical restrictions, copyright law or licensing issues - No uniform way to search for, retrieve and access content for TDM

OPENMINTED - The Open Mining Infrastructure for Text and Data TDM challenges for researchers 2. Services challenges How to identify the most fitting TDM service? How to combine with other TDM services I have access to? How to use them on my content?

OPENMINTED - The Open Mining Infrastructure for Text and Data TDM challenges for researchers 3. Processing challenges Where to deploy? Are my machines powerful enough? How can I get access to powerful machines? Where to store intermediate and final results? How to ensure persistence of storage?

OPENMINTED - The Open Mining Infrastructure for Text and Data OpenMinTeD Provides solutions an open and sustainable TDM infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific-related sources.

OPENMINTED - The Open Mining Infrastructure for Text and Data OpenMinTeD working on many fronts ACCESSIBLE CONTENT DISCOVERABLE SERVICES EFFICIENT PROCESSING RESEARCH COMMUNITIES Via standardised programmatic interfaces Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text Operate on public e-infrastructures via standarized APIs Different scientific communities have different challenges VALUE ADDED APPS Community-driven applications to illustrate the value of the infastructure. Engage with industry. 10

OPENMINTED = The Open Mining Infrastructure for Text and Data The project Started: June 2015 Duration: 3 years Budget of: 6 million Grant of: 5.3 million 16 Partners: - 6 mining research groups - 3 content providers - 1 data center - 1 library association - 2 legal experts - 6 community related partners - 2 SMEs PARTNERS Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK (CORE) EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling

OPENMINTED = The Open Mining Infrastructure for Text and Data The OpenMinTeD landscape

OPENMINTED = The Open Mining Infrastructure for Text and Data Infrastructural approach OpenMinted does not build new services, but adopts and adapts existing services for new communities

OPENMINTED = The Open Mining Infrastructure for Text and Data Infrastructural approach Focuses on interoperability across text mining services and content provision outlets

OPENMINTED = The Open Mining Infrastructure for Text and Data Infrastructural approach Creates and an Open & collaborative space for researchers to use the best fitting text mining services available building on the cloud computing philosophy

Overview OPENMINTED = The Open Mining Infrastructure for Text and Data Users: researchers, curators, text-miners and new services developers Platform services Registry Auth2 & Policy management Workflow Management Annotator Accounting Layer 1: Interoperability of text mining services (platforms or components) Layer 2: Interoperability of language resources & corpora Mining Platforms Mining Platforms Mining Platforms Mining Platforms Proprietary architectures Language resources and corpora registry service Language resources Language resources Language resources Language resources Layer 3: Interoperability to shared storage and computing resources Publisher text corpus Other text corpora OpenAIRE/CORE text corpus Other text corpora PMC text corpus Data centre Data centre Data centre Other text corpora Other types of text corpora Data centre in public cloud

OPENMINTED = The Open Mining Infrastructure for Text and Data Interoperability framework Bringing together mining tools, resources and content 1. Content metadata & transfer standards To document scientific literature, language resources, taxonomies and provenance as well as transfer protocols for full text retrieval

OPENMINTED = The Open Mining Infrastructure for Text and Data Interoperability framework Bringing together mining tools, resources and content 2. Service metadata & pipelining To document and classify text mining services, how they receive input, in what form they output their results, how they combine for workflows, what granularity to consider.

OPENMINTED = The Open Mining Infrastructure for Text and Data Interoperability framework Bringing together mining tools, resources and content 3. IPR and licensing To study IPR restrictions, describe license metadata for re-use, for content and TDM services & tools, and information on how to apply for academic and noncommercial mining research

OPENMINTED = The Open Mining Infrastructure for Text and Data OpenMinTeD users 1. End users - Researchers, data base curators, - Novice: use services to advance their science - Advanced: use TDM services into complex workflows

OPENMINTED = The Open Mining Infrastructure for Text and Data OpenMinTeD users 2. Content and service providers - Publishers, libraries, scientific data base centres, - TDM researchers - SME s

OPENMINTED = The Open Mining Infrastructure for Text and Data Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results. RESEARCH ANALYTICS LIFE SCIENCES AGRICULTURE SOCIAL SCIENCES

Openminted use case 1 Scholarly communication analytics Semantic search and discovery of open scientific outcomes Map of academia scholarly communication network Research monitoring and analytics Partners CORE/OU, OpenAIRE/ARC, Frontiers 2

Openminted use case 2 Life sciences Assisted curation of the EMBL-EBI chemical databases for metabolomics Curation of the neurosciences resources KnowledgeBase and Neurolex Partners EBI - Metabolomics, Human brain project 2

Openminted use case 3 Agriculture and biodiversity Enrich agricultural databases to assist food- and water-borne disease outbreak alerts and product recalls Image, figure and dataset discovery in the AGRIS Partners INRA, AGRO-KNOW 2

Openminted use case 4 social sciences Develop and evaluate methods for the automatic detection and linking of named entities, citation traces and intentions in social science scientific publications Partners GESIS 2

OPENMINTED = The Open Mining Infrastructure for Text and Data What can OpenMinTeD do for you? Are you a content provider? make your content available for mining Register your collections in the OpenMinTeD registry and let others discover it

OPENMINTED = The Open Mining Infrastructure for Text and Data What can OpenMinTeD do for you? Are you a TDM service provider? share and collaborate with other TDM services Register your TDM service in the OpenMinTeD registry and let others discover it.

OPENMINTED = The Open Mining Infrastructure for Text and Data What can OpenMinTeD do for you? Are you a text miner/research who can benefot from text-mining? Use OpenMinTeD (when launched)

OPENMINTED = The Open Mining Infrastructure for Text and Data Conclusions - The ability to text-mine research literature at scale can redefine the way we do research - OpenMinTeD is laying the groundwork (interoperability) and building the cloud infrastructure for text-mining research literature - Building an open, transparent infrastructure that is enabling others to participate

twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus Contact us www.openminted.eu