Semantic MediaWiki (SMW) for Scientific Literature Management

Similar documents
Natural Language Processing for MediaWiki: The Semantic Assistants Approach

Outline. 1 Introduction. 2 Semantic Assistants: NLP Web Services. 3 NLP for the Masses: Desktop Plug-Ins. 4 Conclusions. Why?

LODtagger. Guide for Users and Developers. Bahar Sateli René Witte. Release 1.0 July 24, 2015

Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications

LODtagger. Guide for Users and Developers. Bahar Sateli René Witte. Release 1.1 October 7, 2016

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

What s in this paper? Combining Rhetorical Entities with Linked Open Data for Semantic Literature Querying

TEXT MINING: THE NEXT DATA FRONTIER

Future Trends of ILS

Semantic Web. Lecture XIII Tools Dieter Fensel and Katharina Siorpaes. Copyright 2008 STI INNSBRUCK

Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic

Extending the Facets concept by applying NLP tools to catalog records of scientific literature

KNOWLEDGE GRAPH: FROM METADATA TO INFORMATION VISUALIZATION AND BACK. Xia Lin College of Computing and Informatics Drexel University Philadelphia, PA

Parmenides. Semi-automatic. Ontology. construction and maintenance. Ontology. Document convertor/basic processing. Linguistic. Background knowledge

Text Mining. Representation of Text Documents

Jie Bao, Paul Smart, Dave Braines, Nigel Shadbolt

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Enhanced retrieval using semantic technologies:

Towards the Semantic Desktop. Dr. Øyvind Hanssen University Library of Tromsø

SAPIENT Automation project

Core Technology Development Team Meeting

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Wikipedia is not the sum of all human knowledge: do we need a wiki for open data?

Mark Musen and the NCBO Team

Core Technology Development Team Meeting

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

Text Mining for Software Engineering

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

Reproducible Workflows Biomedical Research. P Berlin, Germany

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Language resource management Semantic annotation framework (SemAF) Part 8: Semantic relations in discourse, core annotation schema (DR-core)

Pfizerpedia Patents The Who, what when and why of patents. David Walsh, Andrew Berridge, RDMi, Pfizer, Sandwich

Pilot: Collaborative Glossary Platform

LIDER Survey. Overview. Number of participants: 24. Participant profile (organisation type, industry sector) Relevant use-cases

An UIMA based Tool Suite for Semantic Text Processing

Turning Text into Insight: Text Mining in the Life Sciences WHITEPAPER

External data in Semantic MediaWiki. Yaron Koren. Semantic wiki mini-series session 5. February 12, 2009

Core Technology Development Team Meeting

INTERNATIONAL JOURNAL OF PURE AND APPLIED RESEARCH IN ENGINEERING AND TECHNOLOGY

Scopus. Information literacy in Chemistry. J une 14, 2011

Text mining tools for semantically enriching the scientific literature

Semantic Web Technologien für Fachinformation

SEMANTIC WEB POWERED PORTAL INFRASTRUCTURE

Building a federated science web one link at a time

Design and Realization of the EXCITEMENT Open Platform for Textual Entailment. Günter Neumann, DFKI Sebastian Pado, Universität Stuttgart

Scholarly Big Data: Leverage for Science

The Role of Repositories and Journals in the Astronomy Research Lifecycle

ICME: Status & Perspectives

Management of Complex Product Ontologies Using a Web-Based Natural Language Processing Interface

Teamware: A Collaborative, Web-based Annotation Environment. Kalina Bontcheva, Milan Agatonovic University of Sheffield

Preserva'on*Watch. What%to%monitor%and%how%Scout%can%help. KEEP%SOLUTIONS%

The CALBC RDF Triple store: retrieval over large literature content

Background. Problem Statement. Toward Large Scale Integration: Building a MetaQuerier over Databases on the Web. Deep (hidden) Web

Future Directions for SysML v2 INCOSE IW MBSE Workshop January 28, 2017

Metadata Ingestion and Processinng

A Model-driven approach to NLP programming with UIMA

Make the most of your access to ScienceDirect

> Semantic Web Use Cases and Case Studies

A Survey on Domain-Specific Languages in Robotics

From concepts to ontologies in metadata: a happy explication of a cognitive scenario for discoverability

Core Technology Development Team Meeting

Core Technology Development Team Meeting

Extracting Conceptual Relationships from Specialized Documents

Domain-specific Concept-based Information Retrieval System

Understanding the workplace of the future. Artificial Intelligence series

Hackathon Project (HC-04): Ontology Summit 2013 Content Hack Leveraging Semantics on OntologPSMW. Ken Baclawski Marcela Vegetti (co-champions)

Scuola di dottorato in Scienze molecolari Information literacy in chemistry 2015 SCOPUS

On the Semantification of 5-Star Technical Documentation

Software Design Document (SDD) Template (summarized from IEEE STD 1016)

Data Immersion : Providing Integrated Data to Infinity Scientists. Kevin Gilpin Principal Engineer Infinity Pharmaceuticals October 19, 2004

Semantic Social Profile a Semantic Boost for the Social Information in MediaWiki

Amine Hallili, PhD student Catherine Faron Zucker & Fabien Gandon, Advisors Elena Cabrio, Supervisor

ICT-SHOK Project Proposal: PROFI

Nuxeo Roadmap. From Memphis to Chicago Nuxeo Team

Making data publication a first class research output

INTRODUCTION TO DATA MINING

An Architecture for Self-Organizing Continuous Delivery Pipelines

Automated Assessment of Security Risks for Mobile Applications

Implementing a Variety of Linguistic Annotations

ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES

Development of an e-library Web Application

FIBO Metadata in Ontology Mapping

Navigating Getty Provenance Resources & Datasets 2016 Pacific Neighborhood Consortium Annual Conference

Open Research Online The Open University s repository of research publications and other research outputs

Annotation Component in KiWi

Semantic e-science. Bibliographic Cloud

Quantifying and Assessing the Merge of Cloned Web-Based System: An Exploratory Study

ONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY

OwlExporter. Guide for Users and Developers. René Witte Ninus Khamis. Release 2.1 December 26, 2010

Globus Platform Services for Data Publication. Greg Nawrocki University of Chicago & Argonne National Lab GeoDaRRS August 7, 2018

The Unbearable Lightness of Wiking - A Study of SMW Usability

A Unified Ontology-Based Process Model for Software Maintenance and Comprehension

Ontology and Application for Reusable Search Interface Design

Software + Services for Data Storage, Management, Discovery, and Re-Use

Collaborative editing of knowledge resources for cross-lingual text mining

An Archiving System for Managing Evolution in the Data Web

Semi-Automatic Conceptual Data Modeling Using Entity and Relationship Instance Repositories

Domain-specific Knowledge Management in a Semantic Desktop

Building and Annotating Corpora of Collaborative Authoring in Wikipedia

Educating a New Breed of Data Scientists for Scientific Data Management

Transcription:

Semantic MediaWiki (SMW) for Scientific Literature Management Bahar Sateli, René Witte Semantic Software Lab Department of Computer Science and Software Engineering Concordia University, Montréal SMWCon Spring 2014 SMW for Scientific Literature Management 1 / 13

Outline 1 Introduction 2 Background 3 System Design 4 Application 5 Conclusion SMW for Scientific Literature Management 2 / 13

Motivation Abundance of publications leads to bottlenecks in curating literature Existing bibliography management systems have limited content analysis support We need an environment that can encompass various activities of a researcher We hypothesize that our tool can improve knowledge-intensive literature analysis tasks through a novel collaboration pattern between humans and AI assistants. We envision a collaborative, wiki-based solution for the semantic management of research literature that integrates: a web-based interface semantic knowledge representation text mining for automatic content analysis SMW for Scientific Literature Management 3 / 13

Related Work Post-publication semantic analysis WikiPapers 1 uses Semantic Forms to collect literature focused on wiki research AcaWiki 2, designed to collect summaries and literature reviews of peer-reviewed academic research Pre-publication semantic enrichment The SALT (Groza et al., 2007) framework uses custom LATEX commands with explicit semantics Our work is complementary to these efforts: we generate bibliographical and semantic entities using human-ai collaboration We transform papers into queryable artifacts, while remaining amenable to human-created semantic annotation 1 WikiPapers, http://wikipapers.referata.com 2 AcaWiki, http://www.acawiki.org SMW for Scientific Literature Management 4 / 13

Natural Language Processing (NLP) A branch of AI that uses various techniques to process content written in natural language Multitude of NLP techniques exist, e.g., Named Entity Recognition (e.g., finding Persons, Organizations, etc.) Quality Assessment Summarization Various NLP APIs (e.g., OpenCalais, GATE,... ) Semantic Assistants framework Calling an NLP Service Parameter Focused Summarization Word Processor Client NLP Service Result Server NLP Service 1 NLP Service 2... NLP Service n SMW for Scientific Literature Management 5 / 13

Requirements Centralized Repository of Knowledge (R1) The system must provide users with the ability to store raw data as well as any information generated by users and analysis tools Automatic Text Analysis Support (R2) The proposed system must provide access to various NLP pipelines in a unified manner. Collaborative Analysis Environment (R3) The proposed system shall provide an environment where all researchers have access to the most up-to-date information and can keep track of content modifications. SMW for Scientific Literature Management 6 / 13

System Architecture Design Decisions: Wiki-based Collaborative Web Interface (R3) Semantic MediaWiki as a Knowledge Base (R1) Text Mining Pipelines for Literature Analysis (R2) Web Server NLP Service Connector MediaWiki Engine Wiki Extensions Semantic MediaWiki Zeeva Facts Wiki NLP Component Web Server Service Invocation Service Information NLP Subsystem Automatic Indexer Readability Analysis Wiki NLP Ontologies Language Service Descriptions Database... User Zeeva Wiki Semantic Assistants SMW for Scientific Literature Management 7 / 13

Semantic Metadata Extraction Given a paper, we are interested in extracting: Structural Entities, i.e., parts of text that uniquely identify a paper, like title or author Semantic (Rhetorical) Entities, i.e., parts of text that describe the contributions, claims, findings and conclusions postulated by the papers authors #paper_1234 :hasauthor :hastitle Semantic Wiki Database #JohnDoe "Towards a semantic..." SMW for Scientific Literature Management 8 / 13

Templating Mechanism SMW for Scientific Literature Management 9 / 13

Automatic Processing of Publications SMW for Scientific Literature Management 10 / 13

Querying the Zeeva Knowledge Base Transform Zeeva from a collaborative analysis environment to a knowledge base The generated semantic metadata can be used within wiki or exported to external applications Semantic MediaWiki provides a simple inline query syntax, e.g., NL Question: Give me all the contributions of Bahar Sateli. Corresponding SMW query: Advantages: 1 The results queried by the system are always up-to-date 2 Lets users discover knowledge created by other users of the wiki SMW for Scientific Literature Management 11 / 13

Querying the Zeeva Knowledge Base SMW for Scientific Literature Management 12 / 13

Summary and Future Works The main question in this research is to evaluate whether concrete literature analysis tasks can be improved using state-of-the-art semantic technologies. We introduced Zeeva, an empirical wiki-based evaluation platform with an extensible architecture Identify literature analysis tasks that can improved with semantic technologies Develop more NLP services relevant to the context of literature analysis Perform an extrinsic evaluation of our hypothesis to assess the usability and efficiency of the proposed approach http://www.semanticsoftware.info/zeeva SMW for Scientific Literature Management 13 / 13