Semantic Technology. Opportunities

Similar documents
The Semantic Web DEFINITIONS & APPLICATIONS

Maximizing the Value of STM Content through Semantic Enrichment. Frank Stumpf December 1, 2009

Semantics. Matthew J. Graham CACR. Methods of Computational Science Caltech, 2011 May 10. matthew graham

A Semantic Web-Based Approach for Harvesting Multilingual Textual. definitions from Wikipedia to support ICD-11 revision

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Google indexed 3,3 billion of pages. Google s index contains 8,1 billion of websites

BUILDING THE SEMANTIC WEB

Taxonomy Tools: Collaboration, Creation & Integration. Dow Jones & Company

Semantic Web. Tahani Aljehani

Semantic Web Company. PoolParty - Server. PoolParty - Technical White Paper.

Terminologies, Knowledge Organization Systems, Ontologies

Proposal for Implementing Linked Open Data on Libraries Catalogue

Knowledge Representations. How else can we represent knowledge in addition to formal logic?

Using Linked Data and taxonomies to create a quick-start smart thesaurus

CHAPTER 1 INTRODUCTION

Envisioning Semantic Web Technology Solutions for the Arts

Powering Knowledge Discovery. Insights from big data with Linguamatics I2E

SKOS. COMP62342 Sean Bechhofer

Chapter 13: Advanced topic 3 Web 3.0

Metadata Standards and Applications. 6. Vocabularies: Attributes and Values

Linked Data: What Now? Maine Library Association 2017

Ontologies SKOS. COMP62342 Sean Bechhofer

Corso di Biblioteche Digitali

The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data

Oshiba Tadahiko National Diet Library Tokyo, Japan

The National Cancer Institute's Thésaurus and Ontology

MarcOnt - Integration Ontology for Bibliographic Description Formats

Controlled vocabularies, taxonomies, and thesauruses (and ontologies)

It Is What It Does: The Pragmatics of Ontology for Knowledge Sharing

Table of contents for The organization of information / Arlene G. Taylor and Daniel N. Joudrey.

Semantic Web Fundamentals

Contents. G52IWS: The Semantic Web. The Semantic Web. Semantic web elements. Semantic Web technologies. Semantic Web Services

Semantic Web Fundamentals

A service based on Linked Data to classify Web resources using a Knowledge Organisation System

Corso di Biblioteche Digitali

0.1 Knowledge Organization Systems for Semantic Web

Unstructured Text in Big Data The Elephant in the Room

Reducing Consumer Uncertainty

NOTSL Fall Meeting, October 30, 2015 Cuyahoga County Public Library Parma, OH by

Opus: University of Bath Online Publication Store

Text Mining. Representation of Text Documents

Springer Science+ Business, LLC

Integrated Access to Biological Data. A use case

Advances in Data Integration & Representation in Systems Biology

A Developer s Guide to the Semantic Web

WebGUI & the Semantic Web. William McKee WebGUI Users Conference 2009

Ontology Engineering. CSE 595 Semantic Web Instructor: Dr. Paul Fodor Stony Brook University

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

ITARC Stockholm Olle Olsson World Wide Web Consortium (W3C) Swedish Institute of Computer Science (SICS)

case study The Asset Description Metadata Schema (ADMS) A common vocabulary to publish semantic interoperability assets on the Web July 2011

Semantic Integration with Apache Jena and Apache Stanbol

Library of Congress BIBFRAME Pilot. NOTSL Fall Meeting October 30, 2015

DCMI Abstract Model - DRAFT Update

Enhanced retrieval using semantic technologies:

SELF-SERVICE SEMANTIC DATA FEDERATION

Semantic Web: vision and reality

New Approach to Graph Databases

Library Technology Conference, March 20, 2014 St. Paul, MN

Organizing Information. Organizing information is at the heart of information science and is important in many other

APPLYING KNOWLEDGE BASED AI TO MODERN DATA MANAGEMENT. Mani Keeran, CFA Gi Kim, CFA Preeti Sharma

Contribution of OCLC, LC and IFLA

DBpedia-An Advancement Towards Content Extraction From Wikipedia

Enhancing information services using machine to machine terminology services

a paradigm for the Semantic Web Linked Data Angelica Lo Duca IIT-CNR Linked Open Data:

University of Bath. Publication date: Document Version Publisher's PDF, also known as Version of record. Link to publication

A Study of Future Internet Applications based on Semantic Web Technology Configuration Model

Information Retrieval

WEB SEARCH, FILTERING, AND TEXT MINING: TECHNOLOGY FOR A NEW ERA OF INFORMATION ACCESS

Mapping between Digital Identity Ontologies through SISM

Semantic MediaWiki A Tool for Collaborative Vocabulary Development Harold Solbrig Division of Biomedical Informatics Mayo Clinic

Semantic Technologies and CDISC Standards. Frederik Malfait, Information Architect, IMOS Consulting Scott Bahlavooni, Independent

Languages and tools for building and using ontologies. Simon Jupp, James Malone

ONTOLOGY MATCHING: A STATE-OF-THE-ART SURVEY

THE GETTY VOCABULARIES TECHNICAL UPDATE

Linked data and its role in the semantic web. Dave Reynolds, Epimorphics

Vocabulary Harvesting Using MatchIT. By Andrew W Krause, Chief Technology Officer

Chapter 16 Linked Data, Ontologies, and DBpedia

EBP. Accessing the Biomedical Literature for the Best Evidence

Ontology Servers and Metadata Vocabulary Repositories

Semantic Web Test

ARKive-ERA Project Lessons and Thoughts

What you have learned so far. Interoperability. Ontology heterogeneity. Being serious about the semantic web

Organizing Economic Information

Natural Language Processing with PoolParty

Available online at ScienceDirect. Procedia Computer Science 52 (2015 )

The Semantic Web Revisited. Nigel Shadbolt Tim Berners-Lee Wendy Hall

Semantic Web Mining and its application in Human Resource Management

<is web> Information Systems & Semantic Web University of Koblenz Landau, Germany

SEBI: An Architecture for Biomedical Image Discovery, Interoperability and Reusability based on Semantic Enrichment

Ontology Summit2007 Survey Response Analysis. Ken Baclawski Northeastern University

Web 2.0 and the Semantic Web

PECULIARITIES OF LINKED DATA PROCESSING IN SEMANTIC APPLICATIONS. Sergey Shcherbak, Ilona Galushka, Sergey Soloshich, Valeriy Zavgorodniy

Copyright 2012 Taxonomy Strategies. All rights reserved. Semantic Metadata. A Tale of Two Types of Vocabularies

Porting Social Media Contributions with SIOC

SEMANTIC SUPPORT FOR MEDICAL IMAGE SEARCH AND RETRIEVAL

Semantic Web Programming

COMPUTER AND INFORMATION SCIENCE JENA DB. Group Abhishek Kumar Harshvardhan Singh Abhisek Mohanty Suhas Tumkur Chandrashekhara

Study and guidelines on Geospatial Linked Data as part of ISA Action 1.17 Resource Description Framework

Building Blocks of Linked Data

Report from the W3C Semantic Web Best Practices Working Group

Transcription:

Semantic Technology Opportunities Avinash Punekar Scientific Publishing Services

April 2011 2 Semantic Technology

April 2011 3 What is Semantic Technology? ² Semantic Web ² Web 3.0 ² Linked Open Data / Linked Enterprise Data ² Web of Data ² Web of Things ² GGG Giant Global Graph ² Is about using software to leverage our understanding and use of information ² And more!!!

April 2011 4 Semantic Technology It is all about DATA ² Semantic Data that is not only machine READABLE. ² It is machine UNDERSTANDABLE! It is not ² A software package ² Something that will ever be complete ² A replacement for the current Web ² A pipe dream ² A silver bullet

April 2011 5 Semantic Technology It is ² A Web-scale architecture ² A metadata technology ² A layer of meaning on the existing Web ² In use TODAY! Semantic enrichment is a process whereby text within a research or scholarly document is annotated by semantic metadata. It enables free text to be converted into a database of knowledge by extracting the concepts and linking the concepts to related knowledge bases.

April 2011 6 Semantic Technology Machine Understanding - How? ² By uniquely identifying THINGS ² By uniquely identifying RELATIONSHIPS ² By using TRIPLES

April 2011 7 Semantic Technology What is a THING? A THING is anything that can be uniquely identified by a URI or a literal (string) Me à http://twitter.com/ericaxel My postal code à http://www.city-data.com/zips/90043.html The White House à Lat: 38.89859 Long: -77.035971 L.A. County s sales tax rate à 9.750 % à http://ericfranzon.com/operator.jpg

April 2011 8 Semantic Technology What is a RELATIONSHIP? Something which connects two THINGS uniquely --- isfatherof -------à <owl:objectproperty rdf:id="isfather"><rdfs:domain rdf:resource="#person"/><rdfs:range rdf:resource="#person"/></owl:objectproperty>

April 2011 9 Semantic Technology What is a TRIPLE? book has title This a Relationship Thing ------------------------------à Thing Predicate Subject ------------------------------à Object

April 2011 10 Semantic Technology Where is it now?

April 2011 11 Semantic Technology Technologies ²RDBMS Data, Schema. Query Language ²Semantic Data, Schema (Vocabularies), Query Language ²Data Language Resource Description Framework ²RDF is good for distributing data across the Web and pretending it s in one place http://plushbeautybar.com dc:creator http://www.ericaxel.com/foaf.rdf http://www.geonames.org/maps/google_34.021_-118.396.html dc: location N 34 1' 16' W 118 23' 47'' http://twitter.com/ericaxel foaf: knows Dave McComb

April 2011 12 Semantic Technology Vocabularies ² Ontologies ² Taxonomies ² Folksonomies

April 2011 13 Semantic Technology Some are ways of describing vocabularies: ² RDF: property triple RELATIONSHIPS ² RDFs (RDF Schema) ² OWL (Web Ontology Language) Some are controlled vocabularies like: ² Dublin Core ² SKOS (Simple Knowledge Organization System) ² SIOC (Semantically-Interlinked Online Communities) Reuse or make up your own!

April 2011 14 Semantic Technology Query Language: SPARQL ² SPARQL ² Protocol ² And ² RDF ² Query ² Language

April 2011 15 Semantic Components The semantic web comprises the standards and tools of HTML5, XML, XML Schema, RDF, RDF Schema and OWL that are organized in the Semantic Web Stack. The OWL Web Ontology Language Overview describes the function and relationship of each of these components of the semantic web: XML provides an elemental syntax for content structure within documents, yet associates no semantics with the meaning of the content contained within. XML Schema is a language for providing and restricting the structure and content of elements contained within XML documents. RDF is a simple language for expressing data models, which refer to objects ("resources") and their relationships. An RDF-based model can be represented in XML syntax. RDF Schema extends RDF and is a vocabulary for describing properties and classes of RDF-based resources, with semantics for generalized-hierarchies of such properties and classes. OWL adds more vocabulary for describing properties and classes: among others, relations between classes (e.g. disjointness), cardinality (e.g. "exactly one"), equality, richer typing of properties, characteristics of properties (e.g. symmetry), and enumerated classes. SPARQL is a protocol and query language for semantic web data sources.

April 2011 16 Semantic Projects DBpedia - DBpedia is an effort to publish structured data extracted from Wikipedia: the data is published in RDF and made available on the Web for use under the GNU Free Documentation License, thus allowing Semantic Web agents to provide inferencing and advanced querying over the Wikipedia-derived dataset and facilitating interlinking, re-use and extension in other data-sources FOAF - A popular application of the semantic web is Friend of a Friend (or FoaF), which uses RDF to describe the relationships people have to other people and the "things" around them. FOAF is an example of how the Semantic Web attempts to make use of the relationships within a social context. GoodRelations for e-commerce - A huge potential for Semantic Web technologies lies in adding data structure and typed links to the vast amount of offer data, product model features, and tendering / request for quotation data. The GoodRelations ontology is a popular vocabulary for expressing product information, prices, payment options, etc. It also allows expressing demand in a straightforward fashion. GoodRelations has been adopted by BestBuy, Yahoo, OpenLink Software, O'Reilly Media, the Book Mashup, and many others. NextBio - A database consolidating high-throughput life sciences experimental data tagged and connected via biomedical ontologies. Nextbio is accessible via a search engine interface. Researchers can contribute their findings for incorporation to the database. The database currently supports gene or protein expression data and is steadily expanding to support other biological data types.

April 2011 17 Web Ontology Language (OWL) The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. Basic Formal Ontology,[14] a formal upper ontology designed to support scientific research BioPAX, an ontology for the exchange and interoperability of biological pathway (cellular processes) data BMO, an e-business Model Ontology based on a review of enterprise ontologies and business model literature CCO (Cell-Cycle Ontology, an application ontology that represents the cell cycle Ccontology, an e-business ontology to support online customer complaint management CIDOC Conceptual Reference Model, an ontology for cultural heritage[19] COSMO, a Foundation Ontology designed to contain representations of all of the primitive concepts needed to logically specify the meanings of any domain entity. It is intended to serve as a basic ontology that can be used to translate among the representations in other ontologies or databases. It started as a merger of the basic elements of the OpenCyc and SUMO ontologies, and has been supplemented with other ontology elements (types, relations) so as to include representations of all of the words in the Longman dictionary defining vocabulary.

April 2011 18 Web Ontology Language (OWL) The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. Cyc, a large Foundation Ontology for formal representation of the universe of discourse. Disease Ontology, designed to facilitate the mapping of diseases and associated conditions to particular medical codes DOLCE, a Descriptive Ontology for Linguistic and Cognitive Engineering Dublin Core, a simple ontology for documents and publishing Foundational, Core and Linguistic Ontologies Foundational Model of Anatomy, an ontology for human anatomy Gene Ontology for genomics GUM (Generalized Upper Model), a linguistically-motivated ontology for mediating between clients systems and natural language technology NIFSTD Ontologies from the Neuroscience Information Framework: a modular set of ontologies for the neuroscience domain. See http://neuinfo.org

April 2011 19 Web Ontology Language (OWL) The Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies endorsed by the World Wide Web Consortium. They are characterised by formal semantics and RDF/XML-based serializations for the Semantic Web. OWL has attracted both academic, medical and commercial interest. OBO Foundry, a suite of interoperable reference ontologies in biomedicine Ontology for Biomedical Investigations, an open access, integrated ontology for the description of biological and clinical investigations OMNIBUS Ontology, an ontology of learning, instruction, and instructional design Plant Ontology for plant structures and growth/development stages, etc. POPE, Purdue Ontology for Pharmaceutical Engineering PRO, the Protein Ontology of the Protein Information Resource, Georgetown University. Program abstraction taxonomy program abstraction taxonomy Protein Ontology for proteomics Systems Biology Ontology (SBO), for computational models in biology Many more (ONIX, MARC, Dublin Core)

April 2011 20 Semantic Technology Why is it important to us? ² It is the future ² All major governments have made adoption mandatory ² All big businesses have adopted it ² The scope in all areas and especially in publishing is huge ² Fundamentally changes what we are doing ² Our customers have adopted it ² Presents new opportunities

April 2011 21 Market The market for semantic enrichment industries/sectors are: will be much larger. Some of the ² Publishing Books, Journals ² Media & Entertainment ² Banking, Finance, Insurance ² Pharmaceutical ² Medical ² Government ² Legal

April 2011 22 Semantic Opportunities ² Content Abstraction ² Technical Data Extraction ² Keyword/Semantic Indexing ² Bibliographic Data Management ² Editorial Services ² Taxonomy, Thesaurus, Ontology, Terminology ² Annotation, Recommendation Creation ² Semantic Tagging ² Semantic Linking ² Researched Linking ² Resource Repurposing

April 2011 23 Content Abstraction Content abstraction is the process of creating a condensed version of a full text article or other technical and research documents. An abstract will provide an indication to the reader of the core themes discussed in the full text. This is used as a document surrogate by publishers to promote the delivery and sales of full text documents. Indicative Abstracts - This discusses what the article indicates in terms of topic and methodology, without providing the key content present in the article. Examples: Product reviews, book abstracts etc. Informative Abstracts - It provides a condensed view of the entire content in the full text document, culling out the key topics and concepts covered. Examples: Abstracts of technical articles, technical standards and specifications. Structured Abstracts - Abstracts created in a structured format with pre-defined headings that truly represent the way the full text is organized. Examples: Abstracts of clinical trials and medical case reports. Here the abstracts follow the typical structure of Introduction/ Background, Scope/ Methods, Results/ discussion/ conclusion based on the specific house style followed by the publisher or information provider. Enhanced/ Value Added Abstracts - Abstracts that pick out the key knowledge that are helpful for decision making using domain expertise and inferences. Examples: English abstracts of patents in multiple languages that extract the key patentability parameters like novelty, use and advantage. Bottom-line summaries/ clinical pearls, etc.

April 2011 24 Technical Data Abstraction Technical data extraction is the process of extracting properties, attributes, metadata and conceptual entities from unstructured technical documents such as patents and non-patent technical literature. Few examples of data that can be extracted from typical chemical and life science related documents are: Systematic Chemical Names (IUPAC Nomenclature) with different spellings; Commas, Periods, Hyphens, Parentheses, Apostrophes, Plusses, Minuses and Greek Symbols Common or Generic Names Trade Names Company Codes Abbreviations Fragmented Descriptors Molecular Formula Genetic Information

April 2011 25 Keyword/Semantic Indexing Indexing is a process where the key descriptors that can represent the core theme of an article or a document are extracted and such article or document is tagged with those descriptors. Such descriptors can be in the form of keywords that are actually present in the document (keyword indexing) or descriptors that represent the key concepts elaborated in the article, but not necessarily to be present in the document (Semantic Indexing). Some of the areas are: Journal Indexing Subject Category Indexing Image Indexing Medical Indexing and Coding/ Evidence Based Rating Drug Indexing Chemical Structure Drawing and Indexing

April 2011 26 Bibliographic Data Management It includes developing, validating, updating and editing bibliographic databases based on the cataloguing rules of some of the leading bibliographic databases like ISSN, OCLC and other leading catalogs. It should also include Onix and RSS feeds.

April 2011 27 Editorial Services Editorial Workflow Administration - Handle the entire manuscript handling process from peer reviewer selection, tracking of manuscripts, reminders to peer reviewers, and style checking of manuscripts. Developmental Editing - Work in tandem with the authors in editing and fine tuning their manuscript.provide services such as fact checking and content enrichment to enhance the authenticity and readability of the manuscript. Content Editing Language Editing Technical Editing Proofreading Editorial Services for Business and Commercial News Services: News Summaries Press Report Analysis Newsletters Media Monitoring Product and Service Descriptions

April 2011 28 Taxonomy, Thesaurus, Ontology, Terminology The offerings should include the following: Taxonomy development and maintenance Taxonomy Mapping/ Integration Taxonomy expansion Semantic labeling of taxonomy nodes through ontology Development of niche taxonomies for medical specialties Automated content mining and vertical search solutions through the deployment of taxonomy and ontology Lexicon development - Word variants, Spelling variants, Morphological variants, Language variants Thesaurus development - Multilevel Broader and Narrower Terms Hierarchical Displays, Construction of Equivalent Terms (Synonyms), Construction of Associated Terms (Related Terms) Ontology Development- Conceptual definition for each node, Disambiguation of homonyms, Deconstruction of existing taxonomies and semantic labeling of taxonomy nodes

April 2011 29 Annotation/Recommendation Creation Annotation Creation During this process the data from the databases is annotated semantically. The process makes the heterogeneous collection data syntactically and semantically interoperable. Recommendation Creation - Rules that define more associative relations between different metadata items need to be created. These rules are based on the domain ontologies, the collection item annotations, and expert knowledge

April 2011 30 Semantic Services @ SPS ² Semantic Tagging Services We can offer our services for content transformation with semantic tagging. ² Semantic Linking Services We can offer our services for semantic linking of the semantic tags with external objects, resources or databases. ² Researched Linking Services In addition to the above service, we can also offer the services of our teams which can research the disparate information over the internet consisting of the above objects, resources, databases which can then be linked to the content. ² Resource Repurposing/Rebuilding Services We can also offer our services for repurposing/rebuilding of resources, objects such as images, graphs, charts, tables, animations, audios, videos, etc.

April 2011 31 Thank You Avinash Punekar avinash@sps.co.in Phone: + 91 91766 50335 Scientific Publishing Services