The Neuroscience Information Framework Practical experiences in using and building community ontologies

Similar documents
Ontology-based annotation of multiscale imaging data: Utilizing and building the Neuroscience Information Framework. Maryann E.

Issues in the Design of a Pilot Concept-Based Query Interface for the Neuroinformatics Information Framework

NCI Thesaurus, managing towards an ontology

Sharing Biomedical Research Data. Michael F. Huerta, Ph.D. Associate Director National Library of Medicine, NIH

Languages and tools for building and using ontologies. Simon Jupp, James Malone

warwick.ac.uk/lib-publications

Introduction to ontologies

Semantic Web Systems Ontologies Jacques Fleuriot School of Informatics

Open web-based annotation: Creating a lightweight, portable knowledge layer over biomedicine

Mouse BIRN Data Integration. Maryann Martone Mouse All Hands Meeting

Disease Information and Semantic Web

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria

The Biomedical Resource Ontology (BRO) to Enable Resource Discovery in Clinical and Translational Research

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

VISO: A Shared, Formal Knowledge Base as a Foundation for Semi-automatic InfoVis Systems

Community-based ontology development, alignment, and evaluation. Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University

Exploring the Generation and Integration of Publishable Scientific Facts Using the Concept of Nano-publications

Prototyping a Biomedical Ontology Recommender Service

Enterprise Knowledge Map: Toward Subject Centric Computing. March 21st, 2007 Dmitry Bogachev

Human Disease Models Tutorial

SELF-SERVICE SEMANTIC DATA FEDERATION

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007

Wondering about either OWL ontologies or SKOS vocabularies? You need both!

The Open Group SOA Ontology Technical Standard. Clive Hatton

KNOWLEDGE GRAPH: FROM METADATA TO INFORMATION VISUALIZATION AND BACK. Xia Lin College of Computing and Informatics Drexel University Philadelphia, PA

The National Cancer Institute's Thésaurus and Ontology

Executive Committee Meeting

Business Benefits of Developing Effective Taxonomies. Cathrin Senn and Ian Davis Taxonomy Consultants

MeSH: A Thesaurus for PubMed

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

SKOS. COMP62342 Sean Bechhofer

Core Technology Development Team Meeting

Enhanced retrieval using semantic technologies:

Ontologies SKOS. COMP62342 Sean Bechhofer

Semantic Web and Web2.0. Dr Nicholas Gibbins

Text mining tools for semantically enriching the scientific literature

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

A Semantic Web for Bioinformatics: Goals, Tools, Systems, Applications

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Core Technology Development Team Meeting

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

0.1 Knowledge Organization Systems for Semantic Web

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

IAALD/2013 World Congress. VIVO Workshop. Brian J. Lowe Jon Corson-Rikert

Auditing Redundant Import in Reuse of a Top Level Ontology for the Drug Discovery Investigations Ontology (DDI)

The Emerging Data Lake IT Strategy

Customisable Curation Workflows in Argo

Integration in the 21 st -Century Enterprise. Thomas Blackadar American Chemical Society Meeting New York, September 10, 2003

Collaborative Ontology Development on the (Semantic) Web

New Approach to Graph Databases

Bridging the semantic gap between diagnostic histopathology and image analysis

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

Acquiring Experience with Ontology and Vocabularies

Describe The Differences In Meaning Between The Terms Relation And Relation Schema

EarthCube and Cyberinfrastructure for the Earth Sciences: Lessons and Perspective from OpenTopography

Opus: University of Bath Online Publication Store

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

PNAMP Metadata Builder Prototype Development Summary Report December 17, 2012

SOFTWARE ARCHITECTURE & DESIGN INTRODUCTION

Ontologies Guidelines for Best Practice

An Ontology and Brain Model-based Semantic Discovery and Visualization System

Toward a Standard Rule Language for Semantic Integration of the DoD Enterprise

Katrin Weller & Isabella Peters

TEXT MINING: THE NEXT DATA FRONTIER

January 16, Re: Request for Comment: Data Access and Data Sharing Policy. Dear Dr. Selby:

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Toward a Knowledge-Based Solution for Information Discovery in Complex and Dynamic Domains

The Final Updates. Philippe Rocca-Serra Alejandra Gonzalez-Beltran, Susanna-Assunta Sansone, Oxford e-research Centre, University of Oxford, UK

How orthogonal are the OBO Foundry ontologies?

Document Navigation: Ontologies or Knowledge Organisation Systems?

MeSH : A Thesaurus for PubMed

Optimizing Query results using Middle Layers Based on Concept Hierarchies

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

MeSH : A Thesaurus for PubMed

Integrating large, fast-moving, and heterogeneous data sets in biology.

Update: MIRIAM Registry and SBO

Models versus Ontologies - What's the Difference and where does it Matter?

Semantics Modeling and Representation. Wendy Hui Wang CS Department Stevens Institute of Technology

Data modeling among non-programmers. Mads Hjorth Danish Commerce and Companies Agency

Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic

Services to Make Sense of Data. Patricia Cruse, Executive Director, DataCite Council of Science Editors San Diego May 2017

Ontology Creation and Development Model

Humboldt-University of Berlin

ANNUAL REPORT Visit us at project.eu Supported by. Mission

What s a BA to do with Data? Discover and define standard data elements in business terms

The Ontology Viewer: Facilitating Image Annotation with Ontology Terms in the CSIDx Imaging Database

Ontology-based Architecture Documentation Approach

Core Technology Development Team Meeting

Construction of Knowledge Base for Automatic Indexing and Classification Based. on Chinese Library Classification

Oracle Spatial and Graph: Benchmarking a Trillion Edges RDF Graph ORACLE WHITE PAPER NOVEMBER 2016

OMV / CTS2 Crosswalk

Mapping the library future: Subject navigation for today's and tomorrow's library catalogs

Semantic Technologies for Nuclear Knowledge Modelling and Applications

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

Executive Committee Meeting

is easing the creation of new ontologies by promoting the reuse of existing ones and automating, as much as possible, the entire ontology

JENA: A Java API for Ontology Management

A Knowledge-Based System for the Specification of Variables in Clinical Trials

Transcription:

The Neuroscience Information Framework Practical experiences in using and building community ontologies Maryann Martone, Ph. D. University of California, San Diego 1

The Neuroscience Information Framework: Discovery and utilization of web-based resources for neuroscience UCSD, Yale, Cal Tech, George Mason, Washington Univ http://neuinfo.org Supported by NIH Blueprint A portal for finding and using neuroscience resources A consistent framework for describing resources Provides simultaneous search of multiple types of information, organized by category Supported by an expansive ontology for neuroscience Utilizes advanced technologies to search the hidden Data Software tools Materials Services Training Jobs Funding opportunities Where do I find Websites Databases Catalogs Literature Supplementary material Information portals...and how many are there? 2

NIF in action Hippocampus OR Cornu Ammonis OR Ammon s horn Data sources categorized by data type and level of nervous system Simplified views of complex data sources Query expansion: Synonyms and related concepts Boolean queries Tutorials for using full resource when getting there from NIF 3

NIF searches across multiple sources of information NIF data federation Independent databases registered with NIF or through web services NIF registry: catalog NIF Web: custom web index (work in progress) NIF literature: neurosciencecentered literature corpus (Textpresso) Guiding principles of NIF Builds heavily on existing technologies (open source tools) Information resources come in many sizes and flavors Framework has to work with resources as they are, not as we wish them to be Federated system; resources will be independently maintained Very few use standard terminology or map to ontologies No single strategy will work for the current diversity of neuroscience resources Trying to design the framework so it will be as broadly applicable as possible to those who are trying to develop technologies Interface neuroscience to the broader life science community Take advantage of emerging conventions in search and 4

Registering a Resource to NIF Level 1 NIF Registry: high level descriptions from NIF vocabularies supplied by human curators Level 2 Access to deeper content; mechanisms for query and discovery; DISCO protocol Level 3 Direct query of web accessible database Automated registration Mapping of database content to NIF vocabulary by human The NIF Registry Very, very simple model Annotated with NIF vocabularies Resource type Organism Reviewed by NIF curators 5

Modular ontologies for neuroscience NIFSTD Organism Macroscopic Anatomy Cell NS Dysfunction Quality Molecule Subcellular Anatomy NS Function Resource Investigation Macromolecule Gene Techniques Instruments Molecule Descriptors Reagent Protocols NIF covers multiple structural scales and domains of relevance to neuroscience Incorporated existing ontologies where possible; extending them for neuroscience where necessary Having community ontologies allowed NIF to implement terminology services very quickly Normalized under the Basic Formal Ontology: an upper ontology used by the OBO Bill Foundry Bug Based on BIRNLex: Neuroscientists didn t like too many choices How are ontologies used? Search: query expansion Synonyms Related classes concept based queries Annotation: Resource categorization Entity mapping Ranking of results NIF Registry; NIF Web 6

Concept-based search: Entity mapping Brodmann area 3 Brodmann.3 NIF indexes over 18 million records; most are not annotated Synonyms and explicit mapping of database content help smooth over terminology differences and custom terminologies NIF TIS: Ontology mappings Concept-based query: GABAergic neuron Not what I say; what I mean Simple search will not return examples of GABAergic neurons unless the data are explicitly tagged as such Too many possible classifications to get them all by keywords Classes are logically defined in NIF ontology i.e., a GABAergic neuron is any neuron that uses GABA as a neurotransmitter 7

Building NIF ontologies: Balancing act Different schools of thought as to how to build vocabularies and ontologies NIF is trying to navigate these waters, keeping in mind: NIF is for both humans and machines Our primary concern is data We have to meet the needs of the community We have a budget and deadlines Building ontologies is difficult even for limited domains, never mind all of neuroscience, but we ve learned a few things Reuse what s there: trying to re-use URI s rather than map when possible Make what you do reusable: adopt best practices where feasible Numerical identifiers, unique labels, single asserted simple hierarchies Engage the community Avoid religious wars: separate the science from the informatics Start simple and add more complexity to support reuse by other communities and to confront the realities of ontology construction Adding complexity Step 1: Build core lexicon (NeuroLex) Classes and their definitions Import from existing ontologies (but only those classes we need) Simple single inheritance and non-controversial hierarchies Each module covers only a single domain Understandable by an average human Step 2: NIFSTD: standardize modules under same upper ontology OBO compliant in OWL Feed classes to ontology creators Step 3: Create intra-domain and more useful hierarchies using properties and restrictions Brain partonomy Step 4: Bridge two or more domains using a standard set of relations Contained In a separate file called a bridge file Neuron to brain region Neuron to molecule, e.g., GABAergic neuron 8

Maintaining multiple versions NIF maintains the NIF vocabularies in different forms for different purposes Neurolex Wiki: Lexicon for community review and comment NIFSTD: set of modular OWL files normalized under BFO and available for download NCBO Bioportal for visibility and mapping services Ontoquest: NIF s ontology server Relational store customized for OWL ontologies Materialized inferred hierarchies for more efficient queries NeuroLex Wiki The human interface Semantic media wiki http://neurolex.org Stephen Larson 9

More human centric Neurolex: OBO lite More stable class structure Ontologies take time; many versions are retired-not good for information systems that are using identifiers Synonyms and abbreviations were essential for users Can t annotate if they can t find it Can t use it for search if they can t find it Used terms that humans could understand Even if they were plural (meninges vs meninx) Facilitates semi-automated mapping Contains subsets of ontologies that are useful to neuroscientists e.g., only classes in Chebi that neuroscientists use Wanted the community to be able to see it and use it Simple understandable hierarchies Removed the independent continuants, entity etc Used them for guidance but we often lack the time and expertise to figure out distinctions 10

NIF Resource Descriptors Working with NCBC (Biomedical Resource Ontology) and NITRC to come up with single resource ontology and information model Reconciling current versions; moving forward jointly Same classes, different views Peter Lyster, Csongor Nyulas, David Kennedy, Maryann Martone, Anita Bandrowski Building from the NIF NIFSTD module: NIF investigation Based on BFO and OBI (now apparently, IAO) Contains objects that are related to resource types, e.g., software application; instrument NIFSTD module: NIF Resource Resource categories, e.g., software resource Objects from NIFSTD reclassified under resource categories using very simple logical restrictions NIF resource browser interface Assigns alternate labels that are easier for users to understand 11

NIFSTD: Investigation NIFSTD: Resource (defined by restrictions in terms of NIFSTD Investigation) Structured knowledge source NIF portal user interface Display label = Database or ontology One area where NIF is actively building an ontology: Neurons and glia Classify them based on properties: many and varied 12

NIF Cell Ontology NIFSTD Bridge File NIF cell NIF molecule NIF anatomy Define a set of properties that relate neuron classes to molecule classes, e.g., Neuron Has neurotransmitter Purkinje cell is a Neuron Purkinje cell has neurotransmitter GABA Neuron has part soma Soma is part of brain region Reclassification based on logical definitions: GABA neuron is any member of class neuron that has neurotransmitter GABA; Hippocampal neuron is any member of the class neuron whose soma is part of any part of the hippocampal formation Working with community ontologies: NIF Molecule Intent: CHEBI, PRO CHEBI too large and undergoing reorganization; PRO not ready and didn t have any molecules that neuroscientists cared about, e.g., ion channels Imported IUPHAR classification of ion channels and G protein coupled receptors Gave to PRO but they gave new identifiers Neuroscientists less concerned about structure than function, roles, dispositions e.g., Potassium channels, neuropeptides, drugs of abuse Can t get those from current hierarchies We can t really tell the difference Relations ontology evolving slowly Solution: Take ChEBI, PRO, NIF and add relations in a bridge file to allow us to do what we need for our constituents NIF Molecule: Classes imported from Chebi as needed (Mereot approach) Created molecule roles Assign: Has role relationship in NIF (simplification of a complex expression) 13

NIF Dysfunction Did not import DO in previous form as it was being redone Created NIF dysfunction from nervous system diseases from MeSH, NINDS terminology and OMIM ~350 classes Primarily classified according to region affected (eye vs nervous system) and then symptoms Have not re-organized according to latest BFO; waiting for guidance on this Do need conditions beyond diseases though Mental illness Addiction Obesity Disease phenotype ontologies Groups (us included) are building phenotype ontologies from the basic NIF modules Application ontologies Phenotype inheres in Organism has part Entity participates in is bearer of is bearer of derives into Information Artifact derives into Protocol has stage Disease Stage Quality Sarah Maynard, Chris Mungal, Suzie Lewis 14

The search for animal models starts here A directory of available disease models A single search across all known animal model databases reference materials, information about the resources, and information about the animal models themselves Start with mouse, zebrafish, and non-human primates Expand to other species and along other dimensions Build on existing ontologies such as NeuroLex" Summary NIF has tried to adopt a flexible, practical approach to assembling, extending and using community ontologies We use a combination of search strategies: string based, lexicon-based, ontology-based Some strategies are based on practicalities, e.g., limitations with working with current tools, but which we hope will support orderly evolution Lots of legacy slop We believe in modularity We believe in starting simple and adding complexity so that we can all work off the same base and add customization as needed while ontologists and scientists are grappling with the nature of reality We believe in single asserted hierarchies and multiple inferred hierarchies Using upper ontologies enforces clarity of thinking and can serve as useful guides to dedicated domain scientists but some issues require more thought than we can give We are establishing processes for working with ontology providers but it has to be two ways Stable identifiers are key We have added a lot of value to the base ontologies, some of which may not pass muster but which is necessary for us to satisfy our constituents and funders Ontologies have to satisfy two communities: naïve users and ontologists; not easy to do both with current tools Lexicons, thesauri, controlled vocabularies, ontologies: all have their place. Would like them to work together. 15

NIF Evolution V1.0: NIF NIF 1.5+++++ Then Now Later 16