An overview of ontology technology Languages and tools for building and using ontologies Simon Jupp, James Malone jupp@ebi.ac.uk, malone@ebi.ac.uk
Outline Languages OWL and OBO classes, individuals, relations, labels, annotations, differences between languages Developing ontologies (Protégé) Viewing ontologies (BioPortal, OLS)* Tools for data annotation (Annotator, Whatizit, Zooma)* Ontologies in applications (Atlas, KupKB)* Goal to think about which ontologies you might use given some data and what criteria you use to decide
Terminology, classification and coding standards Linnaeus 18th century of species Long tradition of building classification systems for medicine Bertillon 19 th Century International Classification of Disease (which later became ICD) Others include SNOMED-CT, MeSH WordNet - Lexicon databases Cyc - knowledge base of common sense knowledge
Knowledge representation systems Building terminologies presents some major challenges Sharing terminologies Interoperability of terminologies between applications Managing consistency Querying and inferring relationships Long history of research in Computer Science Rule based systems / expert systems Frame based systems First order logic Description logics
Look away now!
Ontology Lingua franca Lots of terminology borrowed Philosophy, mathematics, computer science Languages used in ontologies have basic components: Classes / Concepts / Terms / Types Individuals / Instances / Members Relationships / Properties / Roles In addition there are some useful human friendly components: labels (for names of things) Annotations (for things like human readable definitions)
From Description Logics to OWL DL s are a family of knowledge representation language Precisely defined semantics for automated reasoning Handles decidable inference problems Efficient reasoning algorithms 1990 s several languages for representing DLs DAML and OIL 2001 W3C working group formed to develop standard 2004 OWL (Web Ontology Language) specification released
Lovely OWL Squeeze the lovely axioms out of him http://www.w3.org/tr/owl-features/
Well, almost
OWL constructs OWL gives you standard for describing ontologies Lots of constructs but you probably won t need them all Class assertions Subclasses, equivalent classes, intersection, union and complement classes Individual assertions Types, relationships to other individuals Relationships Functional, transitive, symmetric and more
OWL Syntax There are many syntaxes for representing an OWL ontology The W stands for Web OWL primary syntax built on stack of web technology defined by the W3C XML at the bottom of the stack Then RDF but more on that tomorrow
Working with OWL Luckily we have tools for working with OWL
Protégé - http://protege.stanford.edu Open source software for editing OWL ontologies
OWL API - http://owlapi.sourceforge.net
We can even use spreadsheets http://populous.org.uk
OWL reasoners This is where the real strength of OWL lies ELK Infer statements that are not explicitly stated in the ontology Subsumption, equivalence, consistency and instantiation testing
Reasoner plugins to Protégé
Showing unsatisfiable classes
Getting explanations
Inferring subsumption
Defining classes
After classification with a reasoner
Alternatives to OWL Simple Knowledge Organisation system (SKOS) http://www.w3.org/2004/02/skos/ W3C standard for Thesauri, Classification Schemes, Taxonomies, Subject Headings, Other types of Controlled Vocabulary Lighter-weight semantics Less ability to do reasoning and consistency checking
So what about OBO? Open Biomedical Ontology Language Born out of the needs of the life science Less support for reasoning Better support for meta-data Human readable syntax Initially developed for the Gene Ontology OBO Foundry now a coordinated effort for a wide range of reference ontologies of the life sciences http://www.obofoundry.org
OBO edit tools support
OBO and OWL Several attempt to map the languages OBO has a mapping to subset of OWL OBO essentially another OWL syntax Few OBO constructs not possible in OWL Obo Ontology Release Tool (Oort) Roundtrips between OBO and OWL OWL for reasoning and checking + OBO for development
Summary Ontology brings together multiple disciplines Can be overwhelming Lots of opinion (we have lots) OWL, reasoners, ontology editors All maturing technologies Lots of academic/research software Developing good ontologies is a lot like developing good software
Viewing Ontologies in BioPortal http://bioportal.bioontology.org/ Public repository of biomedical ontologies Hosted at the National Centre for Biomedical Ontology (NCBO), Stanford University 426 ontologies covering around 5.8 million terms though many of these will be duplicates Allows uploads of OWL and OBO ontologies
Viewing Ontologies in BioPortal http://bioportal.bioontology.org/
Viewing Ontologies in BioPortal
Viewing Ontologies in OLS Ontology Lookup Service (OLS) based at EBI Browser based on OBO version of ontologies only Web site and web service search facility Visualisation of terms in graph showing common OBO relations such as is_a and part_of
Viewing Ontologies in OLS http://www.ebi.ac.uk/ontology-lookup
Viewing Ontologies in OLS
Semantic Web Search - http://swoogle.umbc.edu/ Index semantic web data and ontologies (more on sem web tomorrow) Can return results in Google like way Not curated caveat emptor
Working out differences between ontologies Bubastis www.ebi.ac.uk/efo/bubastis An OWL ontology is a collection of triples: No requirement on ordering so traditional diff does not work
Ontologies in Applications Programming libraries exist to help utilise ontologies OWL-API is a Java library for creating, manipulating and serialising OWL ontologies Mechanisms for asking for class annotations, subclasses, parent classes and axioms Contains parsers and writers for several formats including RDF/XML, OWL/XML, Turtle and OBO Apache Jena can also handle OWL ontologies
Ontologies in Applications OWL-API used in several applications Expression Atlas uses it to produce tree browser and query expansion
Tools for Data Annotation Effectively annotating data with ontology terms is an open research problem Primary consideration is between coverage and precision, i.e. is goal blanket coverage or high accuracy in annotations Determining the balance is the key which depends upon use case For patient medical record may want high precision even at cost of low coverage For tagging a paper high coverage at cost of lower precision may be acceptable
Tools for Data Annotation NCBO Annotator http://bioportal.bioontology.org/annotator NCBO Annotator can utilise the power of all the ontologies in BioPortal to match text to ontology labels and synonyms Uses combination of metrics to match including: semantic distance matching meaning Ontological distance using is_a closures Ontology mappings two mapped terms may mean same thing Produces a set of ontology classes which match parts of the text entered No scoring or ranking mechanism so there is some overhead in translating this
Tools for Data Annotation NCBO Annotator http://bioportal.bioontology.org/annotator
Tools for Data Annotation NCBO Annotator http://bioportal.bioontology.org/annotator
Tools for Data Annotation WhatIzIt http://www.ebi.ac.uk/webservices/whatizit/
Tools for Data Annotation WhatIzIt http://www.ebi.ac.uk/webservices/whatizit/
But noise remains
Terminizer http://terminizer.org/ Limited to OBO Foundry ontologies
Much overlap in bio-ontologies
Which Bio-ontologies? There are number of tools for viewing and editing ontologies and building them into applications Most are ontology agnostic So how does one find the ontology(ies) that best fits one s needs? What is a good ontology? An open research problem, many different opinions We need to identify needs first what are they, in and of themselves? How can we measure the ontologies against these needs? We use competency questions and use cases
Use case Familiar paradigm to those working in software engineering In software engineering this is typically interaction between user and system to achieve objective In ontology engineering use cases can be subdivided Interaction with data and ontology Interaction with system and ontology Interaction with user and ontology Often require different considerations
Interaction with data and ontology Typical competency questions: Does it cover all of my data (by class name)? Do definition s of class correspond to my data? Is x a subclass of y (because I need it to be)? Can I use this to integrate with Dr Smith s data?
Interaction with system and ontology Typical competency questions: What is limit of querying possible? Will it allow for real-time querying? Is it so big it will break everything? Will an ontology update break my system?
Interaction with user and ontology Typical competency questions: Does the hierarchy look like something user would understand? Do the names of things correspond to user understanding? Do the definitions of things correspond to user understanding?
Task: Finding Ontologies for your data Ex1 example For zebrafish transcripts observed included: GLUT, TOMM40 & TKT Pathways relating to energy metabolism were also observed (1,100 metabolic, 10 glycolysis/gluconeogenesis, 30 pentose phosphate, 51fructose and mannose metabolism, 52 galactose metabolism, and 71 fatty acid metabolism pathways Ex2 look at the OBI, BioAssay & NCI Thesaurus ontologies for these terms: Assay, study, microarray Ex3 clinical data Osteosarcoma biopsy, Homo sapiens, 4 years, MAP 2 presurgery, 60% necrosis, time until recurrence of 126.3 months