Acquiring Experience with Ontology and Vocabularies

Similar documents
NCI Thesaurus, managing towards an ontology

Vendor: The Open Group. Exam Code: OG Exam Name: TOGAF 9 Part 1. Version: Demo

warwick.ac.uk/lib-publications

Languages and tools for building and using ontologies. Simon Jupp, James Malone

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Interoperability and Semantics in Use- Application of UML, XMI and MDA to Precision Medicine and Cancer Research

What s a BA to do with Data? Discover and define standard data elements in business terms

Vocabulary-Driven Enterprise Architecture Development Guidelines for DoDAF AV-2: Design and Development of the Integrated Dictionary

Data Governance Central to Data Management Success

Things to consider when using Semantics in your Information Management strategy. Toby Conrad Smartlogic

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

The National Cancer Institute's Thésaurus and Ontology

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

WHO ICD11 Wiki LexWiki, Semantic MediaWiki and the International Classification of Diseases

Use of Semantic Technologies at Eli Lilly and Company. J Phil Brooks Information Consultant, SE Data Team Discover IT Eli Lilly and Company

Terminology Harmonization

HealthGrids: In Search for Sustainable Solutions

Integration in the 21 st -Century Enterprise. Thomas Blackadar American Chemical Society Meeting New York, September 10, 2003

Component-Based Software Engineering TIP

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Open Ontology Repository Initiative

Tania Tudorache Stanford University. - Ontolog forum invited talk04. October 2007

ehealth EIF ehealth European Interoperability Framework European Commission ISA Work Programme

METADATA REGISTRY, ISO/IEC 11179

Ontrez Project Report National Center for Biomedical Ontology November, 2007

Event Metamodel and Profile (EMP) Proposed RFP Updated Sept, 2007

ACCELERATE YOUR SHAREPOINT ADOPTION AND ROI WITH CONTENT INTELLIGENCE

Vaccine data collection tool Oct Functions, Indicators & Sub-Indicators

Customer Guidance For Requesting Changes to SNOMED CT

Data Quality Assessment Tool for health and social care. October 2018

SNOMED Clinical Terms

The Open Group SOA Ontology Technical Standard. Clive Hatton

Creating a Corporate Taxonomy. Internet Librarian November 2001 Betsy Farr Cogliano

M403 ehealth Interoperability Overview

NIST Normative Test Process Document: e-prescribing (erx) Test Tool

HL7 V3 User Model. The V3 Editing Team. April 9, Ockham Information Services LLC

WP7: Patents Case Study

Precise Medication Extraction using Agile Text Mining

SNOMED CT Implementation Approaches. National Resource Centre for EHR Standards (NRCeS) C-DAC, Pune

Applying Auto-Data Classification Techniques for Large Data Sets

BPS Suite and the OCEG Capability Model. Mapping the OCEG Capability Model to the BPS Suite s product capability.

Experiences in Data Quality

European Platform on Rare Diseases Registration

A method for recommending ontology alignment strategies

EVIDENCE SEARCHING IN EBM. By: Masoud Mohammadi

Controlled Medical Vocabulary in the CPR Generations

2 The IBM Data Governance Unified Process

DATA Act Information Model Schema (DAIMS) Architecture. U.S. Department of the Treasury

Prototyping a Biomedical Ontology Recommender Service

Step: 9 Conduct Data Standardization

Phase II CAQH CORE 202 Certification Policy version March 2011 CAQH 2011

FHA Federal Health Information Model (FHIM) Information Modeling Process Guide

Proposed Regional ehealth Strategy ( )

National Open Source Strategy

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

Introduction to Data Management for Ocean Science Research

FHA Federal Health Information Model (FHIM) Terminology Modeling Process

OG0-091 Q&As TOGAF 9 Part 1

Ontology Summit2007 Survey Response Analysis. Ken Baclawski Northeastern University

SC32 WG2 Metadata Standards Tutorial

UMLS-Query: A Perl Module for Querying the UMLS

Security Automation Developer Days Winter 2010 Conference Preliminary High Level Agenda

Terminologies, Knowledge Organization Systems, Ontologies

European Conference on Quality and Methodology in Official Statistics (Q2008), 8-11, July, 2008, Rome - Italy

cameo Enterprise Architecture UPDM / DoDAF / MODAF / SysML / BPMN / SoaML USER GUIDE version 17.0

Chapter Two: Conformance Clause

FIBO Operational Ontologies Briefing for the Object Management Group

Organisation de Coopération et de Développement Économiques Organisation for Economic Co-operation and Development

A scalable AI Knowledge Graph Solution for Healthcare (and many other industries) Dr. Jans Aasman

Business Analysis for Practitioners - Requirements Elicitation and Analysis (Domain 3)

for TOGAF Practitioners Hands-on training to deliver an Architecture Project using the TOGAF Architecture Development Method

FDA XML Data Format Requirements Specification

Applying LIS Disciplines to the DMBOK Knowledge Areas. Susan Von Fruke, MLIS Federal Reserve Bank of Minneapolis

350 Index 2005 GOAL/QPC

Knowledge Retrieval. Franz J. Kurfess. Computer Science Department California Polytechnic State University San Luis Obispo, CA, U.S.A.

Solutions Technology, Inc. (STI) Corporate Capability Brief

Specification of a Cardiovascular Metadata Model: A Consensus Standard

Phase I CAQH CORE 102: Eligibility and Benefits Certification Policy version March 2011

On the Design and Implementation of a Generalized Process for Business Statistics

Case Study Update on Structured Content Approaches at Genzyme

Community-based ontology development, alignment, and evaluation. Natasha Noy Stanford Center for Biomedical Informatics Research Stanford University

Managing the Emerging Semantic Risks

INFORMATION ASSURANCE DIRECTORATE

CTSA Program Common Metric for Informatics Solutions

Scaling Interoperable Trust through a Trustmark Marketplace

Towards Ease of Building Legos in Assessing ehealth Language Technologies

Toward Horizon 2020: INSPIRE, PSI and other EU policies on data sharing and standardization

Improving Metadata Compliance and Assessing Quality Metrics with a Standards Library

INFORMATION ASSURANCE DIRECTORATE

ECHA s Dissemination website

What s Out There and Where Do I find it: Enterprise Metacard Builder Resource Portal

Modeling Issues Modeling Enterprises. Modeling

Comments submitted at: ange+framework

Core Technology Development Team Meeting

Semantic Technologies for Nuclear Knowledge Modelling and Applications

The Model-Driven Semantic Web Emerging Standards & Technologies

E I. GreenScreen Certified

ICGI Recommendations for Federal Public Websites

FREQUENTLY ASKED QUESTIONS

National Antimicrobial Resistance Surveillance

Transcription:

Acquiring Experience with Ontology and Vocabularies Walt Melo Risa Mayan Jean Stanford The author's affiliation with The MITRE Corporation is provided for identification purposes only, and is not intended to convey or imply MITRE's concurrence with, or support for, the positions, opinions or viewpoints expressed by the authors.

About MITRE Context of MITRE work: Independent FFRDC No vendor affiliations Page 2

Background The agency receives a large amount of information through free text documents and images created by external entities. The submitted information drives many areas of the agency s work: Enforcement activities Policy development Product approval Public health research Regulators and scientists need to: Analyze the submitted documentation Research internal and external documentation related to domain Formulate policies based on document analysis Page 3

Agency Problem Difficult to find relevant documents. Need better ways to support: Naming conventions and data standards Document classifications or taxonomy Document traceability Standard terminology for components described in the documents Difficult to query across internal and external information sources: federated search Difficult to perform queries using both structured and nonstructured information Page 4

Take Ways This presentation describes work done to: Create an ontology from publicly-available resources Improve search of unstructured data via semantic text mining Enhance data quality by leveraging structured terminologies and data standards Page 5

Approach for Applying Ontology and SOA to Improve Federated Search of Documents 1) Define Business Architecture 2) Define Service & Technical Architecture 3) Define Ontology & Vocabularies Page 6

Environmental Scan Business Architecture How Prioritized relevant business functions as defined by Enterprise Architecture Developed business processes Used business modeling (BPMN) Conducted Workshops with SMEs and key stakeholders Leveraged best practices from the field to define business processes Why Align the enterprise business processes to its business vision and strategic goals. Identify needed capabilities (what type of business services are needed to support ontological search) Define roles & responsibilities Who will be responsible for doing what Obtain buy-in Disseminate ideas Inputs to Requirements Page 7

Environmental Scan Service & Technical Architecture How Gathered requirements What capabilities are needed? Technology inventory / survey Identified business service blueprint What are the major business & system services? What capabilities they must provide? Mapped business services to business processes What business & system services are need to support the prioritized business processes? Why Define an architecture blueprint for business services and technology Identify capabilities gaps What services are currently available? What services are needed? Prioritize procurement process What type of COTS can provide the needed capabilities? Such as: ontology engineering tools semantic text mining vocabulary management Page 8

Environmental Scan Ontology & Vocabulary How Used an approach based on semantic web technology and ontology for: Defining data standards Creating a conceptual data model Addressing federated data integration Improving document classification Reusing existing ontologies relevant to domain Why Improve search capabilities by scientists Define a common vocabulary Allow easy integration of structure and unstructured federated data sources Allow automatic document classification Enhance collaboration among inter- and intra- research groups Page 9

Definitions What is an Ontology? An ontology defines the terms used to describe and represent an area of knowledge. It includes computer-usable definitions of concepts in the domain and the relationships among them. Why should we care? Ontologies are used by people, databases, and applications that need to share domain information, such as the health care domain 10

Rationale Permit semantic interoperability - standard terminologies ensure the same words and phrases mean the same thing Provide consistent way to annotate industry submissions concerning products, ingredients, and constituents Enable translation of health-related information received from industry into a structured format that can be analyzed / mined Provide structured way to map the domain specific ontology with other relevant ontologies, such as diseases, toxicology, and so on Allow the semantic integration of data from different data sources a semantic data warehouse

An Example: Deceases Carcinoma Malignant Lung Neoplasm isa isa Lung Carcinoma Synonyms: Cancer of Lung Lung Cancer Cancer of the Lung Carcinoma of the Lung Disclaimer: this is not the actual ontology created by the agency 12

An Example: Chemical Substances Propanone Mehyl Ketone isa isa Acetone Synonyms: Dimethyl ketone 2-Propanone C3H6O dimethylcetone Disclaimer: this is not the actual ontology created by the agency 13

High Level View of the Ontology Development Cycle Scope Determine the intended scope of the Taxonomy Acquire Acquire domain knowledge Borrow Borrow from (or relate to) existing Taxonomies Validate Validate the consistency of the Taxonomy Deploy Deploy the Taxonomy Maintain Maintain the Taxonomy Baclawski K, Niu T. Ontologies for bioinformatics. Cambridge: MIT Press; 2006. Page 14

Scope A bio-medical domain with a strong interrelationship with other domains Medical Chemical Diseases Toxicology Legal entities Standard organizations Page 15

Acquire Domain Knowledge Leverage SME from different domains: Chemical Medical Toxicology Manufacture Public Health Utilize public domain publications Specialized publications Well known web sites maintained by non profit organizations Standards Reuse existing efforts National Cancer Institute (NCI) National Library of Medicine Page 16

Reuse: Ontologies Being Investigated Chemical Entities of Biological Interest ~ freely available dictionary of molecular entities focused on small chemical compounds. National Cancer Institute (NCI) Thesaurus ~ widely recognized standard for biomedical coding and reference, used by a broad variety of public and private partners both nationally and internationally. SNOMED CT (Systematized Nomenclature of Medicine-- Clinical Terms). And so on Do not re-invent the wheel: Import existing ontologies Map created concepts to these ontologies 17

Validation Method Gold Standard Task Based Data or Corpus Driven Manual Assessment Against A Set Of Pre-Defined Criteria Notes Compare to a gold standard. In our case, there is no gold standard, but it is possible to develop metrics of quality (such as term coverage, lack of ambiguity) that can be applied to candidate taxonomies. This approach involves testing the candidate taxonomies against a specific task (such as identifying all variations on a specific chemical component) and assessing the results. This method compares the fit of a terminology to texts in a domain. In our case, this involves testing how many concepts in the terminology were found in a set of industry documents and evaluating how useful the terminology was in navigating through the documents. This would involve manual inspection of the terminology and development of specific related criteria, such as the number of synonyms that were correctly identified. In our case, this validation was not yet done. G MAIGA, DDEMBE WILLIAMS, A Flexible Approach for User Evaluation of Biomedical Ontologies. Int l Journal of Computing and ICT Research, Vol. 2, No. 2, December 2008 Page 18

Corpus Driven Validation: an example Concept Preferred Name NIST CAS# CHEBI_15347 acetone Yes 67-64-1 CHEBI_15366 acetic acid Yes 64-19-7 CHEBI_22698 benzaldehydes CHEBI_31457 decanal Yes 112-31-2 CHEBI_29309 methyl Yes 2229-07-4 CHEBI_29805 glycolate CHEBI_35701 ester CHEBI_18346 vanillin Yes 121-33-5 CHEBI_22695 base CHEBI_25627 octadecadienoic acid CHEBI_27542 methyl oleate CHEBI_32368 undecanoic acid Yes 112-37-8 CHEBI_32544 nicotinate CHEBI_35209 label CHEBI_35366 fatty acid CHEBI_42504 pentadecanoic acid Yes 1002-84-2 CHEBI_48408 ethyl vanillin Yes 121-32-4 19

Deploy Several alternatives Internal Only for inter agency systems, users, and authorized partners Normative External Available to external communities of interest Open Source ontology? National Center for Biomedical Ontology? NCI EVS (Enterprise Vocabulary Services (EVS)? Guidance and reference A combination of both Alternatives being evaluated Page 20

Maintain Create a taxonomy management group Unified team composed of key stakeholders Data standard group Leverage collaborative development tools, such as Collaborative Protégé, supporting: Discussion threads Proposing and voting Annotating ontology components Support for user, groups, and access policies Version and release control Page 21

Tools Ontology Engineering Tool Protégé TopBraid Semantic Text Mining ODIE NCBO Annotator Open Calais SmartLogic Ontology repository Virtuoso AllegroGraph Market research is in progress: This list is not exhaustive Page 22

Conclusion Semantic Web technology and products can be used integrate heterogeneous data sources in an architecture where semantics plays a pivotal role with metadata exchanges and ontology-based searches Semantic Text Mining tools allow analysts to: execute queries which combine structured data with unstructured data navigate unstructured data in a structured way Ontologies, taxonomies, and vocabularies can improve the productivity of those who need to review and evaluate large amount of documentation from different information sources Page 23