TEXT MINING: THE NEXT DATA FRONTIER An Infrastructural Approach Dr. Petr Knoth CORE (core.ac.uk) Knowledge Media institute, The Open University United Kingdom
2 OpenMinTeD Establish an open and sustainable Text and Data Mining (TDM) platform and infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific and scholarly related sources.
beyond Open Access MAKING SENSE OF LARGE VOLUMES OF SCIENTIFIC CONTENT 3
OPENMINTED -The Open Mining Infrastructure for Text and Data The phases of text mining STAGE 1 STAGE 2 STAGE 3 STAGE 4 Information Retrieval NLP Analysis Entity Recognition Information Extraction Data Mining Knowledge Discovery
OPENMINTED - The Open Mining Infrastructure for Text and Data TDM challenges for researchers 1. Content challenges - Barriers and obstacles due to non-availability, technical restrictions, copyright law or licensing issues - No uniform way to search for, retrieve and access content for TDM
OPENMINTED - The Open Mining Infrastructure for Text and Data TDM challenges for researchers 2. Services challenges How to identify the most fitting TDM service? How to combine with other TDM services I have access to? How to use them on my content?
OPENMINTED - The Open Mining Infrastructure for Text and Data TDM challenges for researchers 3. Processing challenges Where to deploy? Are my machines powerful enough? How can I get access to powerful machines? Where to store intermediate and final results? How to ensure persistence of storage?
OPENMINTED - The Open Mining Infrastructure for Text and Data OpenMinTeD Provides solutions an open and sustainable TDM infrastructure where researchers can collaboratively create, discover, share and re-use knowledge from a wide range of text based scientific-related sources.
OPENMINTED - The Open Mining Infrastructure for Text and Data OpenMinTeD working on many fronts ACCESSIBLE CONTENT DISCOVERABLE SERVICES EFFICIENT PROCESSING RESEARCH COMMUNITIES Via standardised programmatic interfaces Well-documented easily discoverable text mining services and workflows which process, analyse and annotate text Operate on public e-infrastructures via standarized APIs Different scientific communities have different challenges VALUE ADDED APPS Community-driven applications to illustrate the value of the infastructure. Engage with industry. 10
OPENMINTED = The Open Mining Infrastructure for Text and Data The project Started: June 2015 Duration: 3 years Budget of: 6 million Grant of: 5.3 million 16 Partners: - 6 mining research groups - 3 content providers - 1 data center - 1 library association - 2 legal experts - 6 community related partners - 2 SMEs PARTNERS Athena RIC Univ. of Manchester (NacTem) Univ. of Darmstadt INRA EMBL-EBI Agro-Know LIBER Univ. of Amsterdam Open University UK (CORE) EPFL CNIO Univ. of Sheffield (GATE) GESIS GRNET Frontiers Univ. of Stirling
OPENMINTED = The Open Mining Infrastructure for Text and Data The OpenMinTeD landscape
OPENMINTED = The Open Mining Infrastructure for Text and Data Infrastructural approach OpenMinted does not build new services, but adopts and adapts existing services for new communities
OPENMINTED = The Open Mining Infrastructure for Text and Data Infrastructural approach Focuses on interoperability across text mining services and content provision outlets
OPENMINTED = The Open Mining Infrastructure for Text and Data Infrastructural approach Creates and an Open & collaborative space for researchers to use the best fitting text mining services available building on the cloud computing philosophy
Overview OPENMINTED = The Open Mining Infrastructure for Text and Data Users: researchers, curators, text-miners and new services developers Platform services Registry Auth2 & Policy management Workflow Management Annotator Accounting Layer 1: Interoperability of text mining services (platforms or components) Layer 2: Interoperability of language resources & corpora Mining Platforms Mining Platforms Mining Platforms Mining Platforms Proprietary architectures Language resources and corpora registry service Language resources Language resources Language resources Language resources Layer 3: Interoperability to shared storage and computing resources Publisher text corpus Other text corpora OpenAIRE/CORE text corpus Other text corpora PMC text corpus Data centre Data centre Data centre Other text corpora Other types of text corpora Data centre in public cloud
OPENMINTED = The Open Mining Infrastructure for Text and Data Interoperability framework Bringing together mining tools, resources and content 1. Content metadata & transfer standards To document scientific literature, language resources, taxonomies and provenance as well as transfer protocols for full text retrieval
OPENMINTED = The Open Mining Infrastructure for Text and Data Interoperability framework Bringing together mining tools, resources and content 2. Service metadata & pipelining To document and classify text mining services, how they receive input, in what form they output their results, how they combine for workflows, what granularity to consider.
OPENMINTED = The Open Mining Infrastructure for Text and Data Interoperability framework Bringing together mining tools, resources and content 3. IPR and licensing To study IPR restrictions, describe license metadata for re-use, for content and TDM services & tools, and information on how to apply for academic and noncommercial mining research
OPENMINTED = The Open Mining Infrastructure for Text and Data OpenMinTeD users 1. End users - Researchers, data base curators, - Novice: use services to advance their science - Advanced: use TDM services into complex workflows
OPENMINTED = The Open Mining Infrastructure for Text and Data OpenMinTeD users 2. Content and service providers - Publishers, libraries, scientific data base centres, - TDM researchers - SME s
OPENMINTED = The Open Mining Infrastructure for Text and Data Bottom-up approach OpenMinTeD works with 4 use cases, which give their requirements and evaluate the results. RESEARCH ANALYTICS LIFE SCIENCES AGRICULTURE SOCIAL SCIENCES
Openminted use case 1 Scholarly communication analytics Semantic search and discovery of open scientific outcomes Map of academia scholarly communication network Research monitoring and analytics Partners CORE/OU, OpenAIRE/ARC, Frontiers 2
Openminted use case 2 Life sciences Assisted curation of the EMBL-EBI chemical databases for metabolomics Curation of the neurosciences resources KnowledgeBase and Neurolex Partners EBI - Metabolomics, Human brain project 2
Openminted use case 3 Agriculture and biodiversity Enrich agricultural databases to assist food- and water-borne disease outbreak alerts and product recalls Image, figure and dataset discovery in the AGRIS Partners INRA, AGRO-KNOW 2
Openminted use case 4 social sciences Develop and evaluate methods for the automatic detection and linking of named entities, citation traces and intentions in social science scientific publications Partners GESIS 2
OPENMINTED = The Open Mining Infrastructure for Text and Data What can OpenMinTeD do for you? Are you a content provider? make your content available for mining Register your collections in the OpenMinTeD registry and let others discover it
OPENMINTED = The Open Mining Infrastructure for Text and Data What can OpenMinTeD do for you? Are you a TDM service provider? share and collaborate with other TDM services Register your TDM service in the OpenMinTeD registry and let others discover it.
OPENMINTED = The Open Mining Infrastructure for Text and Data What can OpenMinTeD do for you? Are you a text miner/research who can benefot from text-mining? Use OpenMinTeD (when launched)
OPENMINTED = The Open Mining Infrastructure for Text and Data Conclusions - The ability to text-mine research literature at scale can redefine the way we do research - OpenMinTeD is laying the groundwork (interoperability) and building the cloud infrastructure for text-mining research literature - Building an open, transparent infrastructure that is enabling others to participate
twitter.com/openminted_eu facebook.com/openminted bit.do/openmintedlinkedin vimeo.com/openminted bit.do/openmintedplus Contact us www.openminted.eu