Ontologies Guidelines for Best Practice

Size: px
Start display at page:

Download "Ontologies Guidelines for Best Practice"

Transcription

1 Ontologies Guidelines for Best Practice To support practical application and mapping Author (s) Pistoia Alliance Ontologies Mapping Project team Version 1.2 Date 7 th April 2016 Summary These best practice guidelines are designed to check how suitable source ontologies are for mapping. It places emphasis on the application of ontologies in the Life Science industry to encourage best practice and to aid mapping of ontologies in a particular domain. This public resource was developed as part of the Pistoia Alliance Ontologies Mapping project.

2 Contents... 1 Contents... 2 Context and Purpose... 4 Background... 4 Motivation and purpose... 4 Best practice use cases... 5 Use Case : Curation of disease annotation... 5 Use Case: Data harmonisation... 6 Use Case : Text Mining... 6 Use Case : Data integration... 7 Use Case : Experimental Investigation... 8 Guidelines for Best Practice... 9 Format... 9 URIs Versioning Documentation Users Authority locus Maintenance License Content delineation Content coverage Content quality Textual definitions of 20

3 Naming conventions Relations Conserved URIs Positive and negative aspects Appendices Application of the Guidelines to a checklist Mapping of the Guidelines References of 20

4 Context and Purpose Background These guidelines for best practice support the application and mapping of ontologies in the life sciences as part of the Pistoia Alliance Ontologies Mapping project which creates better tools and services for mapping ontologies to facilitate their exploitation. Ontologies can include hierarchical relationships; taxonomies; classifications and/or vocabularies which are becoming increasingly important for support of research and development. They have numerous applications such as knowledge management, data integration and text mining where researchers need to analyse large quantities of complex data as part of their daily work. The Ontologies Mapping Project will give users access to standardised tools and methodologies to map and visualise ontologies, to understand ontology structure, potential overlaps and equivalence of meaning. The outcome of this project will be to help users to better integrate, understand and analyse their data more effectively. Motivation and purpose This document describes the best practice guidelines that are designed to check how suitable source ontologies are for mapping. It places emphasis on the application of ontologies in the Life Science industry to encourage best practice and to explain how this relates to the mapping of ontologies in a particular domain. In some areas of Life Science such as Clinical Sciences, best practice is mature enough to be governed by appropriate authorities (e.g. FDA, CDISC, EMA, IDMP etc.), whereas in Preclinical and Translational Research areas, best practices and data standards tend to be much less mature and can even be absent. These guidelines will identify and align with existing communities and authorities, especially those that are relevant to research and early development for particular ontology domains regarded as critical to industry needs rather than the vast field of Life Science. 4 of 20

5 Best practice use cases The use cases below exemplify show typical applications for ontologies and mappings of disease, phenotype and experimental investigation, which serve as "test cases" for these guidelines. Use Case : Curation of disease annotation Publicly available datasets from the Array Express database can be curated into data/metadata cataloguing platforms, to create an exemplar resource. Ontologies and standards are used throughout the platform to ensure the standardisation of data and metadata. Ontology terms are chosen to represent certain diseases described within the Array Express experiments. Examples are given below where two reference disease are exploited:- Human Disease Ontology (DOID) and Human Phenotype Ontology (HP) which bring complementary strengths. Terms for curation were text searched using the Ontobee or NCBO Bioportal ontology browsers. The examples below show how mapping between these ontologies enables harmonised application but this process depends on the context of application which is annotation of data from a gene expression experiment. The term uveal melanoma (Human Disease Ontology DOID_6039) was used for describing the disease in the Array express experiment E-GEOD Mda-9/Syntenin-1 is expressed in uveal melanoma and correlates with metastatic progression. o Uveal melanoma is also present as an exact synonym in the Human Phenotype Ontology term: Intraocular melanoma(hp): o There is also the term uveal melanoma cell which which appears in the results list and is present in the BRENDA tissue ontology - this was not the term we wanted since we wanted to represent the disease under investigation and not the cell type. The term Renal Clear Cell Carcinoma (Human Disease Ontology DOID_4467) was used to describe the disease in the Array express experiment E-GEOD Digital gene expression (DGE) sequencing of 10 pairs samples between kidney normal tissue and cancer tissue. o However, in Experimental Factor Ontology EFO there is similar term: clear cell renal carcinoma (EFO): which has Renal clear cell carcinoma listed as an alternative term. o Also, the obsolete term renal clear cell carcinoma appears in the results list following a text search which we would not use since it is obsolete. 5 of 20

6 Use Case: Data harmonisation This use case shows harmonisation of data obtained from multiple ontological sources to improve standardisation of named entities with similar meaning. Diseases can be described by different acronyms and shorthand text. Assigning ontologies can bring integrity to the data that allows improved integration and querying across datasets. The examples below show equivalent entities in three ontologies, EFO, DOID and HP have different synonyms across these resources which can be harmonised through mapping of equivalence. myocardial infarction (EFO): myocardial infarction (DOID): Myocardial infarction (HP): o can be described as: MI, Myokardinfarkt hypertension (EFO): hypertension (DOID): Hypertension (HP): o can be described as: Arthypertension, HTN diabetes mellitus (EFO): diabetes mellitus (DOID): Diabetes mellitus (HP): o can be described as: Diab1, Diab2, Diabmellitus, diabetes, Diab_Mellitus, diabtype Use Case : Text Mining In general, Ontologies are not designed to support Text Mining and the related Semantic Search. It is however good practice to use these resources for Text Mining and, in particular, for lexical extraction/ named entity extraction. Frequently used resources are for example MeSH or MedDRA. Ontologies mapping can be very useful to support Text Mining because usually the synonym lists of Ontologies (if there are any) are not comprehensive. Mapping ontologies would therefore allow for automatic synonym enrichment where the synonym set would be the union of all synonyms provided by the mapped input ontologies. On the other hand, creating larger synonym sets from ontologies should be done carefully because by transitivity larger synonym sets can be derived due to different levels of granularity. ICD-10 clusters a lot of different concepts in code whereas other resources such as SNOMED or MedDRA are more fine-grained. Some Ontologies are even not appropriate for Text Mining as they contain mainly longer pre-coordinated phrases which do not occur as exact matches in free text. An example would be the Gene Ontology. A concept like "apoptotic process involved in heart morphogenesis" will rarely be used as such in a free text but rather be circumscribed by several phrases. Another important parameter for the usage of Ontologies in Text Mining are the linguistic capabilities of the Text Mining tool itself. By and large there are two approaches. On the one extreme you could spell out 6 of 20

7 all lexical variations of a term (a term can of course consist of multiple words). This will inflate your dictionary for Text Mining but it will speed up the indexing/ annotation process. On the other extreme you could leave the detection of variations to algorithms including deflection, de-derivation and decomposition as well as permutations of words. The second scenario would result in a lean terminology but a higher computational load at indexing/ annotation time. What is completely missing in Ontologies is a linguistic description layer for word forms and their relationship to each other (eg normal form vs case variations, plural etc.). Another example for linguistic descriptions would be ambiguity markers or confidence levels for a specific term in an annotation scenario. Please note that these may vary depending on the context or use case (e.g. indexing a canteen menu with a gene annotator in enterprise search). In the context of Ontologies Mapping, a typical use case for text mining is the creation of an exhaustive synonym list from multiple input sources to serve lexical extraction or named entity extraction. Here is an example for the drug product domain: The Roche drug "tamiflu" has a lot of different synonym types depending on the status of the pipeline. We have compiled the synonyms from different input sources including open access sources (ChEMBL, ChEBI, DrugBank etc.) and commercial ones (Pharmaprojects, Integrity, Pharma Partnering etc.). Here would be an extract of the synonym list: tamiflu (trade name), oseltamivir (generic name, INN), GS-4104, HSDB-7433, RO (LabCode), CCOC(=O)C1=C[C@@H](OC(CC)CC)[C@H](NC(=O)C)[C@@H](N)C1 (SMILES), ethyl (3R,4R,5S)-4-acetamido-5-amino-3-(pentan-3-yloxy)cyclohex-1-ene-1-carboxylate (IUPAC name) etc. Variations such as GS4104, Ro are pre-calculated Use Case : Data integration Data integration includes the data harmonization use case described above. Commonly an ETL process (Extract Transform Load) is used to integrate data from multiple sources with heterogeneous data schemes and formats and which often use different ontologies as a reference vocabulary, e.g, for diseases. Thus the role of ontology mappings for data integration are two-fold: Firstly, for integration at schema-level using mappings to upper-level or mid-level ontologies, e.g., to BFO, OGMS, RO. Secondly, mappings are needed between large reference ontologies which are used by different sources to express data, e.g., diseases may be expressed using DOID, parts of SNOMED CT, ICD-10, MedDRA etc. These mappings of reference ontologies are used in the transformation step to harmonize the reference vocabulary of different sources. In the paper "From Symptoms to Diseases - Creating the Missing Link", we demonstrate how multiple sets of mappings can be used to integrate information of disease symptom relations from many different 7 of 20

8 ontologies of the BioPortal. We show that mapping quality is essential to obtain valuable integration results. Use Case : Experimental Investigation Taxonomies and ontologies provide a vocabulary and semantic model for the representation of laboratory analytical processes. It is used for standardized representation of laboratory analytical processes, involved materials and devices and corresponding results to overcome vendor-specific formats. To enhance interoperability, the Allotrope Foundation ( defines a set of mappings to other ontologies. For instance, entities of the processes domain, are mapped to the following OBO ontologies: Chemical Methods Ontology (CHMO), e.g., "atmospheric pressure chemical ionization": afp:afp_ skos:closematch obo:chmo_ Ontology for Biomedical Investigations (OBI), e.g., "planning": af-p:afp_ skos:closematch obo:obi_ Information Artifact Ontology (IAO), e.g., "plan": af-p:afp_ skos:closematch obo:iao_ Mass Spectrometry Ontology (MS), e.g., "mean of spectra": af-p:afp_ skos:closematch obo:ms_ of 20

9 Guidelines for Best Practice These guidelines for best practice in ontologies support their application and mapping in the selected ontologies domain (Disease and Phenotype for now). The Open Biological and Biomedical Ontologies (OBO) Foundry ( have developed numerous principles, which have been accepted by this open community ( Many of these accepted principles can be regarded as guidelines for best practice, so we reuse much of the description of each relevant OBO principle below. The following guidelines also align well with "Ten Simple Rules for Selecting a Bio-ontology" by Malone et al [PLOS: Computational Biology 2016] DOI: /journal.pcbi The only missing rule is 10: Sometimes an Ontology is Not Needed at All. This rule points out that selection of an ontology should be driven by understanding the user requirements which is crucial for deciding whether an ontology is really needed. Other forms of knowledge representation, such as vocabularies, are often much simpler to understand than ontologies and maybe sufficient to meet the requirements of the user. Format The ontology is in a format made available in a common formal language, in an accepted concrete syntax. The purpose of a common format is to allow the maximum number of people to access and reuse an ontology. Recommended implementations include OBO format, OWL or OWL2 concrete syntax such as RDF/XML, OWL2-XML or OWL2-Manchester syntax. This means that to achieve interoperability requires an acceptable syntax to be implemented in one of the commonly accepted representational models (e.g. OBO, OWL, SKOS etc., possibly with defined restrictions). More details, including examples, can be found via the FP_002_format OBO wiki page ( - accepted). The representation of the ontologies can also make use of vocabularies mainly implemented in RDF. These describe a set of standardized pre-defined concepts and predicates such as SKOS, VoID, FOAF etc. In case of overlapping standards, resources like LOV (Linked Open Vocabularies) might help ( because they provide a comprehensive overview, of which vocabularies are mostly applied. Many ontologies in the Disease, Phenotype and Experimental Investigation domains are available in one or more of these common formats via the OBO foundry ( NCBO BioPortal ( or ontology home web sites. Ontologies that use a non-standard format are likely to impede interoperability which will limit their application and mapping. 9 of 20

10 URIs Each class and relation (property) in an ontology should have a Uniform Resource Identifier (URIs) to address identifier space ( The identifier should be constructed from a base URI, a prefix that is unique to the ontology (e.g. GO, CHEBI, HPO) and a local identifier (e.g ). The local identifier should be a numeric string and not consist of labels or mnemonics meaningful to humans. This means that ontology IDs will take the form <IDSPACE> : <NUMBER>. The ontology prefix (<IDSPACE> must be registered with an appropriate authority, such as OBO library, in advance. Although it is tempting to make a URI meaningful to humans, their primary purpose is machine readability where the overriding consideration is stability of URIs (see Cool URIs don't change: which facilitates interoperability. More details, including examples, can be found via the FP_003_URIs OBO wiki page ( - accepted). This guideline aligns with Malone et al 2016 Rule 3: The Ontology Classes and Relationships Should Persist. Most ontologies in the Disease, Phenotype and Experimental Investigation domains use URIs to address identifier space. They are available via the OBO foundry ( NCBO BioPortal ( or directly from home web sites. Ontologies that use non-standard or human readable identifiers are likely to impede interoperability which will limit their application and mapping. Versioning The ontology must disclose versioning through metadata to reflect the history of change. The provider should show through this metadata that it has procedures for identifying distinct successive versions. This description summarises the FP_004_versioning OBO wiki page ( - accepted). This guideline aligns and is extended to include access to previous versions with Malone et al 2016 Rule 8: Previous Versions Should Be Available. Versioning can not only be applied to the entire ontology as more fine-grained approaches exist. The RDF specification of the MetaData registry for CDISC based on ISO supports a versioning at the concept level (mms:administereditem). Please note that validated environments require versioning at a term/ concept level. Most ontologies in the Disease, Phenotype and Experimental Investigation domains disclose versioning. Those that do not are likely to be of very limited value. 10 of 20

11 Documentation The ontology must be documented in sufficient quality and detail. This documentation should be located on the ontology home website in the form of a published paper describing the ontology and manuals for developers and users. Essential aspects of the documentation should also be recorded as metadata, embedded within the ontology. This description summarises the FP_008_documented OBO wiki page ( - accepted). Most ontologies in the Disease, Phenotype and Experimental Investigation domains provide documentation on the home website as links to publications and manuals. Absence of documentation makes the ontology less likely to be adopted for application. Users The ontology developers should document the evidence that the ontology is used by multiple independent people or organisations. This ensures that the ontology tackles a relevant scientific area and does so in a usable and sustainable fashion. It is important to be able to illustrate usage outside of the immediate circle of ontology developers and stakeholders. More details, including examples of evidence, can be found via the FP_009_users OBO wiki page ( - accepted). This guideline aligns with Malone et al 2016 Rule 6: The Ontology Should Be Developed by the Community but Not Incapacitated by It. Many ontologies in the Disease, Phenotype and Experimental Investigation domains provide evidence of a substantial user community. They often include a documentation page with links to databases using the ontology for annotation. Good examples are semantic web resources e.g. Array Express, usage and diverse software applications, including text mining and analysis workflow pipelines. Also, publications showing the ontology is being used in research. Such evidence of a substantial user community makes the ontology more likely to be a credible source. Authority locus There should be clear responsibility for the ontology, for ensuring continued maintenance in light of scientific advance and prompt response to user feedback. A single point (mechanism) of contact for support and feedback should be provided on the ontology home website. This description has been adapted from the FP_011_locus_of_authority OBO wiki page ( - accepted). This guideline is also related to Malone et al 2016 Rule 6: The Ontology Should Be Developed by the Community but Not Incapacitated by It. 11 of 20

12 Most ontologies in the Disease, Phenotype and Experimental Investigation domains identify the responsible leader and development team, along with a details for making contact e.g. with queries and feedback from the users. Provision of a simple mechanism for feedback from users to the provider is an important aspect of best practice which could be regarded as a process" feature. Maintenance Ontologies have to be maintained to reflect the continuous advance of science, otherwise they become stale and unable to represent the latest knowledge. The ontology provider must provide evidence that the ontology is being maintained with appropriate regularity, rigorous quality and a funding source. This evidence of maintenance and funding should be documented through the ontology home web site. This accepted OBO principle, included in the original 2006 principles, requires update on on FP_016_maintenance OBO wiki page ( - accepted). This guideline aligns with Malone et al 2016 Rule 7: The Ontology Should Be under Active Development. Another important criteria with regards to Maintenance is response time. How long does it take to process a request? Resources like the NCI Thesaurus or MedDRA offer means to place a request, however, update cycles can be very long. Most ontologies in the Disease, Phenotype and Experimental Investigation domains provide documented evidence of maintenance and funding on their home web site. Those that do not are likely to be of very limited value. License Openly available ontologies can be used by all without any constraint other than (a) its origin must be acknowledged and (b) it is not to be altered and subsequently redistributed in altered form under the original name or with the same identifiers. All ontologies available in the OBO Foundry are open whereas license terms for ontologies available through the BioPortal can be much more restricted. This is important to understand because it could impact on interoperability and freedom to undertake mapping. Further details about recommendations for open license, implementation and examples can be found via the FP_001_open OBO wiki page ( - accepted). This guideline aligns with Malone et al 2016 Rule 9: Open Data Requires Open Ontologies. Many ontologies in the Disease, Phenotype and Experimental Investigation domains are available openly via the OBO foundry, NCBO BioPortal or directly from ontology home web sites. Ontologies that have license restrictions are likely to impede interoperability which can limit their application and mapping. 12 of 20

13 Content delineation Each class and relation (property) in an ontology should have clearly delineated content of acceptable precision. The ontology should be orthogonal to other related ontologies which adhere to best practice. The major reason for this is to allow two different ontologies, for example anatomy and biological process, to be combined through additional relationships. These relationships could then be used to constrain when terms could be jointly applied to describe complementary (but distinguishable) perspectives on the same biological or medical entity. As a corollary to this, we would strive for community acceptance of a single ontology for one domain, rather than encouraging rivalry between ontologies. This description summarises the FP_005_delineated_content OBO wiki page ( - accepted). This guideline aligns with Malone et al 2016 Rule 1: The Ontology Should Be about a Specific Domain of Knowledge. A important aspect of content delineation is to make it clear whether the ontology is designed to be a reference for a particular domain. Alternatively, it could be intended as an application ontology which is designed to support a particular application, often in multiple domains. An example of this is the Experimental Factor Ontology (EFO) which is an application ontology designed to support experimental investigations. EFO includes the reuse of relevant reference ontologies such as as Human Phenotype Ontology (HPO) and (Human) Disease Ontology (DO). An upper level ontology is another type of content delineation which is designed to bridge across more specific ontologies in particular domains. An example of this in the Disease and Phenotype domain is Basic Formal Ontology (BFO) as an generic upper ontology, Ontology for Biomedical Investigations (OBI) as a disease-neutral ontology and numerous disease specific ontologies. Many ontologies in the Disease, Phenotype and Experimental Investigation domains have clearly delineated content which makes them more likely to bring unique value for application. It also makes it more likely, as discussed already, that delineated content will facilitate interoperability through mapping additional relationships between different ontologies, including equivalence. Content coverage Ontologies should include content of acceptable coverage so that there is sufficient number of concepts to cover an ontology domain and provides enough terms and associated metadata such as name, label, definition and synonyms. There also needs to be sufficient breadth and depth of coverage combined with organisational principles (e.g. taxonomies, partonomies) and granularity (detail and depth of modelling). Another obvious aspect of coverage is missing content. This guideline aligns with Malone et al 2016 Rule 2: The Ontology Should Reflect Current Understanding of Biological Systems. The number of instances of classes and relations can give an indication of coverage. This guideline can also be tested through sampling of instances in the ontology. 13 of 20

14 Many ontologies in the Disease, Phenotype and Experimental Investigation domains have sufficient coverage of content to represent knowledge in meaningful ways. Inadequate coverage is likely to be of limited use. Content quality Ontology content should be of acceptable quality where this has two aspects. First to what extent have these guidelines for best practice been respected (formal correctness). Second, has the content of the domain been properly modeled (correctness of the content). This guideline aligns with Malone et al 2016 Rule 2: The Ontology Should Reflect Current Understanding of Biological Systems. This guideline can be tested through sampling of instances in the ontology. For example in the Phenotype and Disease domain, an ontology could contain two concepts called "Tooth Disease" and "Caries". "Caries" is a sub-concept of "Tooth Disease". However, "Tooth Caries" is a synonym of "Tooth Disease". Formally, no one could prevent you from adding the synonym relationship between "Tooth Disease" and "Tooth Caries". However, from an engineering perspective the domain in not properly captured. Poor representation of knowledge such as this will limit the usefulness of an ontology. Textual definitions The ontology needs to contain textual definitions for a substantial and representative fraction, plus equivalent formal definitions (for at least a substantial number of terms). For terms lacking textual definitions, there should be evidence of implementation of a strategy to provide definitions for all remaining undefined terms. Text definitions should be unique (i.e. no two terms should share a definition). This is the vocabulary or dictionary component of an ontology that provides definitions for class terms and those with equivalent meaning (i.e. synonyms). This description summarises the FP_006_textual_definitions OBO wiki page ( - accepted). This guideline aligns with Malone et al 2016 Rule 4: Classes Should Contain Textual Definitions. This guideline can be tested through sampling of instances in the ontology. Many ontologies in the Disease, Phenotype and Experimental Investigation domains have textual definitions (vocabulary) for class terms which makes them more likely to bring unique value for application. A high proportion of quality textual definitions will facilitate interoperability through mapping the meaning (semantics) of equivalence in different ontologies. Naming conventions Naming conventions used by ontology providers tend to be a heterogeneous and inconsistent. This is because names emerge often in an ad hoc manner rather than through an agreed nomenclature. Of 14 of 20

15 course there are exceptions which are much more mature and consistent, such as the HUGO Gene Nomenclature Committee which provides the authoritative source of human gene names ( Another excellent example is Chemical Entities of Biological Entities (ChEBI) which started as a curated nomenclature for small molecules and has developed into a mature ontology in the OBO Foundry. The OBO principle wiki page, FP_012_naming_conventions ( is under development and mostly mentions the publication entitled "Surveybased naming conventions for use in OBO Foundry ontology development by Schober et al 2009 ( This guideline aligns with Malone et al 2016 Rule 5: Textual Definitions Should Be Written for Domain Experts. This guideline can be tested through sampling of instances in the ontology. Some ontologies in the Disease, Phenotype and Experimental Investigation domains have naming conventions driven by support for applications such as phenotype for an inherited disease or clinical terms for clinical investigations which can result in naming conventions of mixed quality and form. This can hinder interoperability, making it difficult to map between different ontologies in this domain. Relations Best practice for the representation of relations in ontologies is still emerging. This is because the standard formats for ontologies such as OBO and OWL use instance level relations rather than type level relations. Here types equal what are described in textbooks whereas instances are what we observe, measure or perform experiments on. One approach to representation of relations in an ontology is to make use of an upper level ontology, such as Basic Formal Ontology (BFO) ( This is described fully in the recent book by Arp, Smith and Spear: Building Ontologies with Basic Formal Ontology published by MIT Press, August 17, The original formulation of the OBO principle for relations is noted as requires some modifications on the FP_007_relations OBO wiki page ( - accepted). Many ontologies in the Disease, Phenotype and Experimental Investigation domains include relations, usually at the instance level. This is an emerging area of best practice which may impact on mapping of equivalence between ontologies in this domain. Conserved URIs Ontologies often overlap in a particular domain. This overlap is harmless and can be mapped readily when URIs are preserved to the source ontologies. Here interoperability is guaranteed wherever the relevant terms and their URIs are conserved e.g. Gene Ontology in the OBO Foundry, where reuse is evident in BioPortal search results. Similarly, upper level or 15 of 20

16 "Meta ontologies e.g. Uberon in the anatomy domain, contain cross references to source URIs which makes overlap harmless by design. Many ontologies in the Disease, Phenotype and Experimental Investigation domains reuse terms and include cross references to source URIs. This best practice makes mapping between ontologies which overlap straightforwards. However, when source URIs are NOT conserved, overlap between ontologies in the same domain, tend to be harmful, making mapping of equivalence much more difficult. 16 of 20

17 Positive and negative aspects Positive and negative aspects of the above guidelines of best practice are listed in the Table below. The positive aspects are encouraged whereas the negative aspects can hinder application and mapping of ontologies and should to be avoided or minimised:- Guideline Positive aspect Negative aspect 1. Format Open standard Non-standard 2. URIs Used and persistent Not used and not persistent 3. Versioning Used with date Not used and no date 4. Documentation High quality and coverage Poor or absent 5. Users Evidence beyond provider Poor or missing evidence 6. Authority Clearly defined Unclear or missing 7. Maintenance Evidence of currency and sustainability Poor or missing evidence 8. License Clearly defined terms and conditions License terms can restrict use 9. Content delineation Clear Unclear or no delineation 10. Content coverage * Acceptable Inadequate or sparse or gaps 11. Content quality * Acceptable Poor or inaccurate 12. Textual definitions * Acceptable Insufficient or absent 13. Naming conventions * Acceptable Insufficient or absent 14. Relations Consistent, clear model Inconsistent 15. Conserved URIs Cross reference to source URIs Missing source URIs *tested through relevant sampling 17 of 20

18 Appendices Application of the Guidelines to a checklist Pistoia Alliance Guidelines checklist for disease phenotype experimental investigation ontologies.xlsx The guidelines checklist has been populated with ontologies for disease, phenotype and experimental investigation to illustrate use of the guidelines. Local download of this sheet can serve as a template for further consideration of ontologies in other data domains. Mapping of the Guidelines to OBO principles and the 10 rules of Malone et al 2016 Guideline OBO Principle 10 Rules of Malone et al Format 2. URIs 3. Versioning Rule 3: The Ontology Classes and Relationships Should Persist. Rule 8: Previous Versions Should Be Available 4. Documentation 5. Users 6. Authority 7. Maintenance Rule 6: The Ontology Should Be Developed by the Community but Not Incapacitated by It Rule 6: The Ontology Should Be Developed by the Community but Not Incapacitated by It Rule 7: The Ontology Should Be under Active Development 18 of 20

19 8. License Rule 9: Open Data Requires Open Ontologies 9. Content delineation 10. Content coverage 11. Content quality Rule 1: The Ontology Should Be about a Specific Domain of Knowledge Rule 2: The Ontology Should Reflect Current Understanding of Biological Systems Rule 2: The Ontology Should Reflect Current Understanding of Biological Systems 12. Textual definitions Rule 4: Classes Should Contain Textual Definitions 13. Naming conventions Rule 5: Textual Definitions Should Be Written for Domain Experts 14. Relations Conserved URIs 19 of 20

20 References The Open Biological and Biomedical Ontologies (OBO) Foundry ( NCBO BioPortal ( "Survey-based naming conventions for use in OBO Foundry ontology development by Schooner et al 2009 ( Arp, Smith and Spear: Building Ontologies with Basic Formal Ontology published by MIT Press, August 17, 2015 Malone et al "Ten Simple Rules for Selecting a Bio-ontology" PLOS: Computational Biology 2016 DOI: /journal.pcbi of 20

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision A Semantic Web-Based Approach for Harvesting Multilingual Textual Definitions from Wikipedia to Support ICD-11 Revision Guoqian Jiang 1,* Harold R. Solbrig 1 and Christopher G. Chute 1 1 Department of

More information

Acquiring Experience with Ontology and Vocabularies

Acquiring Experience with Ontology and Vocabularies Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended

More information

Languages and tools for building and using ontologies. Simon Jupp, James Malone

Languages and tools for building and using ontologies. Simon Jupp, James Malone An overview of ontology technology Languages and tools for building and using ontologies Simon Jupp, James Malone jupp@ebi.ac.uk, malone@ebi.ac.uk Outline Languages OWL and OBO classes, individuals, relations,

More information

SNOMED Clinical Terms

SNOMED Clinical Terms Representing clinical information using SNOMED Clinical Terms with different structural information models KR-MED 2008 - Phoenix David Markwell Laura Sato The Clinical Information Consultancy Ltd NHS Connecting

More information

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata

Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata Meeting Host Supporting Partner Meeting Sponsors Reducing Consumer Uncertainty Towards a Vocabulary for User-centric Geospatial Metadata 105th OGC Technical Committee Palmerston North, New Zealand Dr.

More information

Reducing Consumer Uncertainty

Reducing Consumer Uncertainty Spatial Analytics Reducing Consumer Uncertainty Towards an Ontology for Geospatial User-centric Metadata Introduction Cooperative Research Centre for Spatial Information (CRCSI) in Australia Communicate

More information

warwick.ac.uk/lib-publications

warwick.ac.uk/lib-publications Original citation: Zhao, Lei, Lim Choi Keung, Sarah Niukyun and Arvanitis, Theodoros N. (2016) A BioPortalbased terminology service for health data interoperability. In: Unifying the Applications and Foundations

More information

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria

Taking a view on bio-ontologies. Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Taking a view on bio-ontologies Simon Jupp Functional Genomics Production Team ICBO, 2012 Graz, Austria Who we are European Bioinformatics Institute one of world s largest bio data and service providers

More information

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper. Semantic Web Company PoolParty - Server PoolParty - Technical White Paper http://www.poolparty.biz Table of Contents Introduction... 3 PoolParty Technical Overview... 3 PoolParty Components Overview...

More information

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Semantic Technologies and CDISC Standards Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent Part I Introduction to Semantic Technology Resource Description Framework

More information

FCA-Map Results for OAEI 2016

FCA-Map Results for OAEI 2016 FCA-Map Results for OAEI 2016 Mengyi Zhao 1 and Songmao Zhang 2 1,2 Institute of Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing, P. R. China 1 myzhao@amss.ac.cn,

More information

Ontology Summit2007 Survey Response Analysis. Ken Baclawski Northeastern University

Ontology Summit2007 Survey Response Analysis. Ken Baclawski Northeastern University Ontology Summit2007 Survey Response Analysis Ken Baclawski Northeastern University Outline Communities Ontology value, issues, problems, solutions Ontology languages Terms for ontology Ontologies April

More information

NCI Thesaurus, managing towards an ontology

NCI Thesaurus, managing towards an ontology NCI Thesaurus, managing towards an ontology CENDI/NKOS Workshop October 22, 2009 Gilberto Fragoso Outline Background on EVS The NCI Thesaurus BiomedGT Editing Plug-in for Protege Semantic Media Wiki supports

More information

Terminologies, Knowledge Organization Systems, Ontologies

Terminologies, Knowledge Organization Systems, Ontologies Terminologies, Knowledge Organization Systems, Ontologies Gerhard Budin University of Vienna TSS July 2012, Vienna Motivation and Purpose Knowledge Organization Systems In this unit of TSS 12, we focus

More information

Representing Multiple Standards in a Single DAM: Use of Atomic Classes

Representing Multiple Standards in a Single DAM: Use of Atomic Classes Representing Multiple Standards in a Single DAM: Use of Atomic Classes Salvatore Mungal 1 ; Mead Walker 2 ; David F Kong 3 ; Rebecca Wilgus 3 ; Dana Pinchotti 4 ; James E Tcheng 3 ; William Barry 1 ; Brian

More information

Community-based ontology development, alignment, and evaluation. Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University

Community-based ontology development, alignment, and evaluation. Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University Community-based ontology development, alignment, and evaluation Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University Community-based Ontology... Everything Development and

More information

Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic

Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic Outline MediaWiki what it is, how it works Semantic MediaWiki MediaWiki

More information

Science Europe Consultation on Research Data Management

Science Europe Consultation on Research Data Management Science Europe Consultation on Research Data Management Consultation available until 30 April 2018 at http://scieur.org/rdm-consultation Introduction Science Europe and the Netherlands Organisation for

More information

SKOS. COMP62342 Sean Bechhofer

SKOS. COMP62342 Sean Bechhofer SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Ontologies Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies

More information

Disease Information and Semantic Web

Disease Information and Semantic Web Rheinische Friedrich-Wilhelms-Universität Bonn Institute of Computer Science III Disease Information and Semantic Web Master s Thesis Supervisor: Prof. Sören Auer, Heiner OberKampf Turan Gojayev München,

More information

Health Information Exchange Content Model Architecture Building Block HISO

Health Information Exchange Content Model Architecture Building Block HISO Health Information Exchange Content Model Architecture Building Block HISO 10040.2 To be used in conjunction with HISO 10040.0 Health Information Exchange Overview and Glossary HISO 10040.1 Health Information

More information

New Approach to Graph Databases

New Approach to Graph Databases Paper PP05 New Approach to Graph Databases Anna Berg, Capish, Malmö, Sweden Henrik Drews, Capish, Malmö, Sweden Catharina Dahlbo, Capish, Malmö, Sweden ABSTRACT Graph databases have, during the past few

More information

Ontologies SKOS. COMP62342 Sean Bechhofer

Ontologies SKOS. COMP62342 Sean Bechhofer Ontologies SKOS COMP62342 Sean Bechhofer sean.bechhofer@manchester.ac.uk Metadata Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies

More information

POMap results for OAEI 2017

POMap results for OAEI 2017 POMap results for OAEI 2017 Amir Laadhar 1, Faiza Ghozzi 2, Imen Megdiche 1, Franck Ravat 1, Olivier Teste 1, and Faiez Gargouri 2 1 Paul Sabatier University, IRIT (CNRS/UMR 5505) 118 Route de Narbonne

More information

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Knowledge Representations. How else can we represent knowledge in addition to formal logic? Knowledge Representations How else can we represent knowledge in addition to formal logic? 1 Common Knowledge Representations Formal Logic Production Rules Semantic Nets Schemata and Frames 2 Production

More information

Opus: University of Bath Online Publication Store

Opus: University of Bath Online Publication Store Patel, M. (2004) Semantic Interoperability in Digital Library Systems. In: WP5 Forum Workshop: Semantic Interoperability in Digital Library Systems, DELOS Network of Excellence in Digital Libraries, 2004-09-16-2004-09-16,

More information

Semantic Technology. Opportunities

Semantic Technology. Opportunities Semantic Technology Opportunities Avinash Punekar Scientific Publishing Services April 2011 2 Semantic Technology April 2011 3 What is Semantic Technology? ² Semantic Web ² Web 3.0 ² Linked Open Data /

More information

XML in the bipharmaceutical

XML in the bipharmaceutical XML in the bipharmaceutical sector XML holds out the opportunity to integrate data across both the enterprise and the network of biopharmaceutical alliances - with little technological dislocation and

More information

Smart Open Services for European Patients. Work Package 3.5 Semantic Services Definition Appendix E - Ontology Specifications

Smart Open Services for European Patients. Work Package 3.5 Semantic Services Definition Appendix E - Ontology Specifications 24Am Smart Open Services for European Patients Open ehealth initiative for a European large scale pilot of Patient Summary and Electronic Prescription Work Package 3.5 Semantic Services Definition Appendix

More information

Taming Rave: How to control data collection standards?

Taming Rave: How to control data collection standards? Paper DH08 Taming Rave: How to control data collection standards? Dimitri Kutsenko, Entimo AG, Berlin, Germany Table of Contents Introduction... 1 How to organize metadata... 2 How to structure metadata...

More information

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E Powering Knowledge Discovery Insights from big data with Linguamatics I2E Gain actionable insights from unstructured data The world now generates an overwhelming amount of data, most of it written in natural

More information

Linked Data: Fast, low cost semantic interoperability for health care?

Linked Data: Fast, low cost semantic interoperability for health care? Linked Data: Fast, low cost semantic interoperability for health care? About the presentation Part I: Motivation Why we need semantic operability in health care Why enhancing existing systems to increase

More information

Ontology-based Architecture Documentation Approach

Ontology-based Architecture Documentation Approach 4 Ontology-based Architecture Documentation Approach In this chapter we investigate how an ontology can be used for retrieving AK from SA documentation (RQ2). We first give background information on the

More information

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company Taxonomy Tools: Collaboration, Creation & Integration Dave Clarke Global Taxonomy Director dave.clarke@dowjones.com Dow Jones & Company Introduction Software Tools for Taxonomy 1. Collaboration 2. Creation

More information

Workshop 2. > Interoperability <

Workshop 2. > Interoperability < Workshop 2 21 / 08 / 2011 > Interoperability < Heiko Zimmermann R&D Engineer, AHI CR Santec Heiko.Zimmermann@tudor.lu Interoperability definition Picture from NCI-Wiki (https://wiki.nci.nih.gov) 2 Interoperability

More information

An e-infrastructure for Language Documentation on the Web

An e-infrastructure for Language Documentation on the Web An e-infrastructure for Language Documentation on the Web Gary F. Simons, SIL International William D. Lewis, University of Washington Scott Farrar, University of Arizona D. Terence Langendoen, National

More information

Dataset-XML - A New CDISC Standard

Dataset-XML - A New CDISC Standard Dataset-XML - A New CDISC Standard Lex Jansen Principal Software Developer @ SAS CDISC XML Technologies Team Single Day Event CDISC Tools and Optimization September 29, 2014, Cary, NC Agenda Dataset-XML

More information

Prototyping a Biomedical Ontology Recommender Service

Prototyping a Biomedical Ontology Recommender Service Prototyping a Biomedical Ontology Recommender Service Clement Jonquet Nigam H. Shah Mark A. Musen jonquet@stanford.edu 1 Ontologies & data & annota@ons (1/2) Hard for biomedical researchers to find the

More information

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL Wang Wei, Payam M. Barnaghi School of Computer Science and Information Technology The University of Nottingham Malaysia Campus {Kcy3ww, payam.barnaghi}@nottingham.edu.my

More information

What is Text Mining? Sophia Ananiadou National Centre for Text Mining University of Manchester

What is Text Mining? Sophia Ananiadou National Centre for Text Mining   University of Manchester National Centre for Text Mining www.nactem.ac.uk University of Manchester Outline Aims of text mining Text Mining steps Text Mining uses Applications 2 Aims Extract and discover knowledge hidden in text

More information

Building an effective and efficient background knowledge resource to enhance ontology matching

Building an effective and efficient background knowledge resource to enhance ontology matching Building an effective and efficient background knowledge resource to enhance ontology matching Amina Annane 1,2, Zohra Bellahsene 2, Faiçal Azouaou 1, Clement Jonquet 2,3 1 Ecole nationale Supérieure d

More information

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING

CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING in partnership with Overall handbook to set up a S-DWH CoE: Deliverable: 4.6 Version: 3.1 Date: 3 November 2017 CoE CENTRE of EXCELLENCE ON DATA WAREHOUSING Handbook to set up a S-DWH 1 version 2.1 / 4

More information

Content Interoperability Strategy

Content Interoperability Strategy Content Interoperability Strategy 28th September 2005 Antoine Rizk 1 Presentation plan Introduction Context Objectives Input sources Semantic interoperability EIF Definition Semantic assets The European

More information

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008

Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Ontology Matching with CIDER: Evaluation Report for the OAEI 2008 Jorge Gracia, Eduardo Mena IIS Department, University of Zaragoza, Spain {jogracia,emena}@unizar.es Abstract. Ontology matching, the task

More information

Humboldt-University of Berlin

Humboldt-University of Berlin Humboldt-University of Berlin Exploiting Link Structure to Discover Meaningful Associations between Controlled Vocabulary Terms exposé of diploma thesis of Andrej Masula 13th October 2008 supervisor: Louiqa

More information

DCO: A Mid Level Generic Data Collection Ontology

DCO: A Mid Level Generic Data Collection Ontology DCO: A Mid Level Generic Data Collection Ontology by Joel Cummings A Thesis presented to The University of Guelph In partial fulfilment of requirements for the degree of Master of Science in Computer Science

More information

Paper DS07 PhUSE 2017 CDISC Transport Standards - A Glance. Giri Balasubramanian, PRA Health Sciences Edwin Ponraj Thangarajan, PRA Health Sciences

Paper DS07 PhUSE 2017 CDISC Transport Standards - A Glance. Giri Balasubramanian, PRA Health Sciences Edwin Ponraj Thangarajan, PRA Health Sciences Paper DS07 PhUSE 2017 CDISC Transport Standards - A Glance Giri Balasubramanian, PRA Health Sciences Edwin Ponraj Thangarajan, PRA Health Sciences Agenda Paper Abstract CDISC Standards Types Why Transport

More information

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE

INFORMATION RETRIEVAL SYSTEM: CONCEPT AND SCOPE 15 : CONCEPT AND SCOPE 15.1 INTRODUCTION Information is communicated or received knowledge concerning a particular fact or circumstance. Retrieval refers to searching through stored information to find

More information

The Model-Driven Semantic Web Emerging Standards & Technologies

The Model-Driven Semantic Web Emerging Standards & Technologies The Model-Driven Semantic Web Emerging Standards & Technologies Elisa Kendall Sandpiper Software March 24, 2005 1 Model Driven Architecture (MDA ) Insulates business applications from technology evolution,

More information

Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper

Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper Bridging the Gap between Semantic Web and Networked Sensors: A Position Paper Xiang Su and Jukka Riekki Intelligent Systems Group and Infotech Oulu, FIN-90014, University of Oulu, Finland {Xiang.Su,Jukka.Riekki}@ee.oulu.fi

More information

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research Ian Fore, D.Phil. Associate Director, Biorepository and Pathology Informatics Senior Program

More information

Proposal for Implementing Linked Open Data on Libraries Catalogue

Proposal for Implementing Linked Open Data on Libraries Catalogue Submitted on: 16.07.2018 Proposal for Implementing Linked Open Data on Libraries Catalogue Esraa Elsayed Abdelaziz Computer Science, Arab Academy for Science and Technology, Alexandria, Egypt. E-mail address:

More information

M403 ehealth Interoperability Overview

M403 ehealth Interoperability Overview CEN/CENELEC/ETSI M403 ehealth Interoperability Overview 27 May 2009, Bratislava Presented by Charles Parisot www.ehealth-interop.eu Mandate M/403 M/403 aims to provide a consistent set of standards to

More information

RD-Action WP5. Specification and implementation manual of the Master file for statistical reporting with Orphacodes

RD-Action WP5. Specification and implementation manual of the Master file for statistical reporting with Orphacodes RD-Action WP5 Specification and implementation manual of the Master file for statistical reporting with Orphacodes Second Part of Milestone 27: A beta master file version to be tested in some selected

More information

The Agricultural Ontology Server: A Tool for Knowledge Organisation and Integration

The Agricultural Ontology Server: A Tool for Knowledge Organisation and Integration PROJECT PROPOSAL: Reference: The Agricultural Ontology Server: A Tool for Knowledge Organisation and Integration Food and Agriculture Organization of the United Nations (GILW) Rome June 2001 Contents EXECUTIVE

More information

ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES

ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES ONTOLOGY LIBRARIES: A STUDY FROM ONTOFIER AND ONTOLOGIST PERSPECTIVES Debashis Naskar 1 and Biswanath Dutta 2 DSIC, Universitat Politècnica de València 1 DRTC, Indian Statistical Institute 2 OUTLINE Introduction

More information

Using Linked Data and taxonomies to create a quick-start smart thesaurus

Using Linked Data and taxonomies to create a quick-start smart thesaurus 7) MARJORIE HLAVA Using Linked Data and taxonomies to create a quick-start smart thesaurus 1. About the Case Organization The two current applications of this approach are a large scientific publisher

More information

Analyzing user interactions with biomedical ontologies: A visual perspective

Analyzing user interactions with biomedical ontologies: A visual perspective Analyzing user interactions with biomedical ontologies: A visual perspective Maulik R. Kamdar, Simon Walk, Tania Tudorache and Mark A. Musen Stanford Center for Biomedical Informatics Research, Stanford

More information

A Developer s Guide to the Semantic Web

A Developer s Guide to the Semantic Web A Developer s Guide to the Semantic Web von Liyang Yu 1. Auflage Springer 2011 Verlag C.H. Beck im Internet: www.beck.de ISBN 978 3 642 15969 5 schnell und portofrei erhältlich bei beck-shop.de DIE FACHBUCHHANDLUNG

More information

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered.

Content Enrichment. An essential strategic capability for every publisher. Enriched content. Delivered. Content Enrichment An essential strategic capability for every publisher Enriched content. Delivered. An essential strategic capability for every publisher Overview Content is at the centre of everything

More information

OMV / CTS2 Crosswalk

OMV / CTS2 Crosswalk OMV / CTS2 Crosswalk Outline Common Terminology Services 2 (CTS2) - a brief introduction CTS2 and OMV a crosswalk 2012/01/17 OOR Metadata Workgroup 2 OMV / CTS2 Crosswalk CTS2 A BRIEF INTRODUCTION 2012/01/17

More information

Generalized Document Data Model for Integrating Autonomous Applications

Generalized Document Data Model for Integrating Autonomous Applications 6 th International Conference on Applied Informatics Eger, Hungary, January 27 31, 2004. Generalized Document Data Model for Integrating Autonomous Applications Zsolt Hernáth, Zoltán Vincellér Abstract

More information

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended.

TDWI strives to provide course books that are contentrich and that serve as useful reference documents after a class has ended. Previews of TDWI course books offer an opportunity to see the quality of our material and help you to select the courses that best fit your needs. The previews cannot be printed. TDWI strives to provide

More information

The Semantic Web DEFINITIONS & APPLICATIONS

The Semantic Web DEFINITIONS & APPLICATIONS The Semantic Web DEFINITIONS & APPLICATIONS Data on the Web There are more an more data on the Web Government data, health related data, general knowledge, company information, flight information, restaurants,

More information

The Neuroscience Information Framework Practical experiences in using and building community ontologies

The Neuroscience Information Framework Practical experiences in using and building community ontologies The Neuroscience Information Framework Practical experiences in using and building community ontologies Maryann Martone, Ph. D. University of California, San Diego 1 The Neuroscience Information Framework:

More information

Data Management Glossary

Data Management Glossary Data Management Glossary A Access path: The route through a system by which data is found, accessed and retrieved Agile methodology: An approach to software development which takes incremental, iterative

More information

Introduction to RDF and the Semantic Web for the life sciences

Introduction to RDF and the Semantic Web for the life sciences Introduction to RDF and the Semantic Web for the life sciences Simon Jupp Sample Phenotypes and Ontologies Team European Bioinformatics Institute jupp@ebi.ac.uk Practical sessions Converting data to RDF

More information

Open Ontology Repository Initiative

Open Ontology Repository Initiative Open Ontology Repository Initiative Frank Olken Lawrence Berkeley National Laboratory National Science Foundation folken@nsf.gov presented to CENDI/NKOS Workshop World Bank Sept. 11, 2008 Version 6.0 DISCLAIMER

More information

Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework

Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework DG Joint Research Center Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework 6 th of May 2014 Danny Vandenbroucke Diederik Tirry Agenda 1 Introduction

More information

Data Quality Assessment Tool for health and social care. October 2018

Data Quality Assessment Tool for health and social care. October 2018 Data Quality Assessment Tool for health and social care October 2018 Introduction This interactive data quality assessment tool has been developed to meet the needs of a broad range of health and social

More information

Semantic Web for Earth and Environmental Terminology (SWEET) Status, Future Development and Community Building

Semantic Web for Earth and Environmental Terminology (SWEET) Status, Future Development and Community Building Semantic Web for Earth and Environmental Terminology (SWEET) 2018 Status, Future Development and Community Building 2 Agenda and Purpose Current status of SWEET e.g. What has the community been doing?

More information

Using Ontologies for Data and Semantic Integration

Using Ontologies for Data and Semantic Integration Using Ontologies for Data and Semantic Integration Monica Crubézy Stanford Medical Informatics, Stanford University ~~ November 4, 2003 Ontologies Conceptualize a domain of discourse, an area of expertise

More information

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Suite and the OCEG Capability Model Mapping the OCEG Capability Model to the BPS Suite s product capability. BPS Contents Introduction... 2 GRC activities... 2 BPS and the Capability Model for GRC...

More information

A Knowledge-Based System for the Specification of Variables in Clinical Trials

A Knowledge-Based System for the Specification of Variables in Clinical Trials A Knowledge-Based System for the Specification of Variables in Clinical Trials Matthias Löbe, Barbara Strotmann, Kai-Uwe Hoop, Roland Mücke Institute for Medical Informatics, Statistics and Epidemiology

More information

Efficient, Scalable, and Provenance-Aware Management of Linked Data

Efficient, Scalable, and Provenance-Aware Management of Linked Data Efficient, Scalable, and Provenance-Aware Management of Linked Data Marcin Wylot 1 Motivation and objectives of the research The proliferation of heterogeneous Linked Data on the Web requires data management

More information

Balanced Large Scale Knowledge Matching Using LSH Forest

Balanced Large Scale Knowledge Matching Using LSH Forest Balanced Large Scale Knowledge Matching Using LSH Forest 1st International KEYSTONE Conference IKC 2015 Coimbra Portugal, 8-9 September 2015 Michael Cochez * Vagan Terziyan * Vadim Ermolayev ** * Industrial

More information

Enabling efficiency through Data Governance: a phased approach

Enabling efficiency through Data Governance: a phased approach Enabling efficiency through Data Governance: a phased approach Transform your process efficiency, decision-making, and customer engagement by improving data accuracy An Experian white paper Enabling efficiency

More information

6. The Document Engineering Approach

6. The Document Engineering Approach 6. The Document Engineering Approach DE + IA (INFO 243) - 11 February 2008 Bob Glushko 1 of 40 Plan for Today's Class Modeling Methodologies The Document Engineering Approach 2 of 40 What Modeling Methodologies

More information

ISO CTS2 and Value Set Binding. Harold Solbrig Mayo Clinic

ISO CTS2 and Value Set Binding. Harold Solbrig Mayo Clinic ISO 79 CTS2 and Value Set Binding Harold Solbrig Mayo Clinic ISO 79 Information technology - Metadata registries (MDR) Owning group is ISO/IEC JTC /SC 32 Organization responsible for SQL standard Six part

More information

Customisable Curation Workflows in Argo

Customisable Curation Workflows in Argo Customisable Curation Workflows in Argo Rafal Rak*, Riza Batista-Navarro, Andrew Rowley, Jacob Carter and Sophia Ananiadou National Centre for Text Mining, University of Manchester, UK *Corresponding author:

More information

WHO ICD11 Wiki LexWiki, Semantic MediaWiki and the International Classification of Diseases

WHO ICD11 Wiki LexWiki, Semantic MediaWiki and the International Classification of Diseases WHO ICD11 Wiki LexWiki, Semantic MediaWiki and the International Classification of Diseases Guoqian Jiang, PhD Harold Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic College of Medicine

More information

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet.

Project Name. The Eclipse Integrated Computational Environment. Jay Jay Billings, ORNL Parent Project. None selected yet. Project Name The Eclipse Integrated Computational Environment Jay Jay Billings, ORNL 20140219 Parent Project None selected yet. Background The science and engineering community relies heavily on modeling

More information

Natural Language Processing with PoolParty

Natural Language Processing with PoolParty Natural Language Processing with PoolParty Table of Content Introduction to PoolParty 2 Resolving Language Problems 4 Key Features 5 Entity Extraction and Term Extraction 5 Shadow Concepts 6 Word Sense

More information

MDA & Semantic Web Services Integrating SWSF & OWL with ODM

MDA & Semantic Web Services Integrating SWSF & OWL with ODM MDA & Semantic Web Services Integrating SWSF & OWL with ODM Elisa Kendall Sandpiper Software March 30, 2006 Level Setting An ontology specifies a rich description of the Terminology, concepts, nomenclature

More information

NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation

NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation Martínez-Romero et al. Journal of Biomedical Semantics (2017) 8:21 DOI 10.1186/s13326-017-0128-y RESEARCH Open Access NCBO Ontology Recommender 2.0: an enhanced approach for biomedical ontology recommendation

More information

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication Citation for published version: Patel, M & Duke, M 2004, 'Knowledge Discovery in an Agents Environment' Paper presented at European Semantic Web Symposium 2004, Heraklion, Crete, UK United Kingdom, 9/05/04-11/05/04,.

More information

re3data.org - Making research data repositories visible and discoverable

re3data.org - Making research data repositories visible and discoverable re3data.org - Making research data repositories visible and discoverable Robert Ulrich, Karlsruhe Institute of Technology Hans-Jürgen Goebelbecker, Karlsruhe Institute of Technology Frank Scholze, Karlsruhe

More information

Description Cross-domain Task Force Research Design Statement

Description Cross-domain Task Force Research Design Statement Description Cross-domain Task Force Research Design Statement Revised 8 November 2004 This document outlines the research design to be followed by the Description Cross-domain Task Force (DTF) of InterPARES

More information

From Lexicon To Mammographic Ontology: Experiences and Lessons

From Lexicon To Mammographic Ontology: Experiences and Lessons From Lexicon To Mammographic : Experiences and Lessons Bo Hu, Srinandan Dasmahapatra and Nigel Shadbolt Department of Electronics and Computer Science University of Southampton United Kingdom Email: {bh,

More information

Reproducible Workflows Biomedical Research. P Berlin, Germany

Reproducible Workflows Biomedical Research. P Berlin, Germany Reproducible Workflows Biomedical Research P11 2018 Berlin, Germany Contributors Leslie McIntosh Research Data Alliance, U.S., Executive Director Oya Beyan Aachen University, Germany Anthony Juehne RDA,

More information

Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary

Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary December 17, 2009 Version History Version Publication Date Author Description

More information

Corso di Biblioteche Digitali

Corso di Biblioteche Digitali Corso di Biblioteche Digitali Vittore Casarosa casarosa@isti.cnr.it tel. 050-315 3115 cell. 348-397 2168 Ricevimento dopo la lezione o per appuntamento Valutazione finale 70-75% esame orale 25-30% progetto

More information

LexGrid Philosophy, Model and Interfaces Harold R Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic

LexGrid Philosophy, Model and Interfaces Harold R Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic LexGrid Philosophy, Model and Interfaces Harold R Solbrig Division of Biomedical Statistics and Informatics Mayo Clinic Outline Why the LexGrid model was created LexGrid approach and principles Key aspects

More information

National Centre for Text Mining NaCTeM. e-science and data mining workshop

National Centre for Text Mining NaCTeM. e-science and data mining workshop National Centre for Text Mining NaCTeM e-science and data mining workshop John Keane Co-Director, NaCTeM john.keane@manchester.ac.uk School of Informatics, University of Manchester What is text mining?

More information

Office of the Government Chief Information Officer XML SCHEMA DESIGN AND MANAGEMENT GUIDE PART I: OVERVIEW [G55-1]

Office of the Government Chief Information Officer XML SCHEMA DESIGN AND MANAGEMENT GUIDE PART I: OVERVIEW [G55-1] Office of the Government Chief Information Officer XML SCHEMA DESIGN AND MANAGEMENT GUIDE PART I: OVERVIEW [G-] Version. November 00 The Government of the Hong Kong Special Administrative Region COPYRIGHT

More information

Digital repositories as research infrastructure: a UK perspective

Digital repositories as research infrastructure: a UK perspective Digital repositories as research infrastructure: a UK perspective Dr Liz Lyon Director This work is licensed under a Creative Commons Licence Attribution-ShareAlike 2.0 UKOLN is supported by: Presentation

More information

HITSP Standards Harmonization Process -- A report on progress

HITSP Standards Harmonization Process -- A report on progress Document Number: HITSP 06 N 75 Date: May 4, 2006 HITSP Standards Harmonization Process -- A report on progress Arlington, VA May 4 th, 2006 0 What Was Done Reviewed obligations from federal contract Observed

More information

Semantic Knowledge Discovery OntoChem IT Solutions

Semantic Knowledge Discovery OntoChem IT Solutions Semantic Knowledge Discovery OntoChem IT Solutions OntoChem IT Solutions GmbH Blücherstr. 24 06120 Halle (Saale) Germany Tel. +49 345 4780472 Fax: +49 345 4780471 mail: info(at)ontochem.com Get the Gold!

More information

CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume

CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume CDISC Standards End-to-End: Enabling QbD in Data Management Sam Hume 1 Shared Health and Research Electronic Library (SHARE) A global electronic repository for developing, integrating

More information

Draft SDMX Technical Standards (Version 2.0) - Disposition Log Project Team

Draft SDMX Technical Standards (Version 2.0) - Disposition Log Project Team Draft SDMX Technical s (Version 2.0) - Disposition Log Project 1 Project 2 Project general general (see below for exampl es) In the document Framework for SDMX technical standards, version 2) it is stated

More information